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In  this  thesis,  we  develop  efficient  algorithms  for  three  problems  that  arise  in 
electronic  computer  aided  design  (ECAD).  (1)  component  stack  folding,  (2)  standard 
and  custom  cell  folding,  and  (3)  planar  topological  routing. 

The  component  stack  folding  problem  arises  in  the  layout  of  bit-slice  architec- 
tures. We  consider  two  versions  of  this  problem.  In  both,  the  components  have  equal 
width  and  when  a  stack  is  folded,  a  routing  penalty  is  incurred  at  the  fold.  In  the 
first  version,  the  height  of  the  folded  layout  is  given  and  we  have  to  minimize  the 
width.  In  the  second,  the  width  of  the  folded  layout  is  given  and  its  height  is  to  be 
minimized.  We  develop  a  normalization  technique  that  permits  the  first  version  to 

ix  --^ 


be  solved  in  linear  time  by  a  greedy  algorithm.    The  second  version  can  be  solved 
efficiently  using  normalization  and  parametric  search. 

In  standard  and  custom  folding,  the  component  list  is  folded  into  rows  and  a 
routing  penalty  is  incurred  between  two  rows.  In  the  model  we  consider,  the  number 
of  wires  that  have  to  cross  between  two  rows  serves  as  the  routing  penalty.  Nme 
versions  of  the  folding  problem  are  formulated  and  efficient  algorithms  are  developed 

for  each. 

We  develop  a  simple,  fast  linear  time  algorithm  to  determine  if  a  collection  of 
two-pin  nets  can  be  routed  topologically  in  a  plane.  Topological  routability  testing 
of  a  collection  of  multi-pin  nets  is  shown  to  be  equivalent  to  planarity  testing,  and  a 
simple  linear  time  algorithm  is  developed  for  the  case  when  the  collection  of  modules 
remain  connected  following  the  deletion  of  all  nets  with  more  than  two  pins. 

Experimental  results  are  presented. 


CHAPTER  1 
INTRODUCTION 


1.1  Background 
With  current  technology,  a  single  chip  can  have  several  million  transistors.  De- 
sign and  fabrication  of  such  chips  is  made  possible  by  the  automation  of  the  steps 
involving  the  development  of  the  chip.  Starting  with  the  formal  specifications,  the 
VLSI  design  cycle  goes  through  a  series  of  steps  to  produce  the  final  product,  a  fully 
packaged  chip.  The  VLSI  design  cycle  consists  of  the  following  steps  [24]: 

1.  System  Specification:  In  this  step  the  high  level  representation  of  the  system  is 
created.  Performance,  functionality,  the  physical  dimensions,  the  choice  of  the 
design  techniques  and  the  fabrication  technology  are  considered  in  this  step. 

2.  Functional  Design:  The  output  of  this  step  is  a  timing  diagram  which  is  ob- 
tained by  considering  the  behavioral  aspects  of  the  system. 

3.  Logic  Design:  The  logic  design,  in  general,  is  represented  by  Boolean  expres- 
sions. The  logic  design  that  represents  the  functional  design  is  obtained  in 
this  step.  The  boolean  expressions  are  minimized  to  obtain  the  smallest  logic 
design.  Correctness  of  the  logic  design  is  also  asserted  in  this  step. 


4.  Circuit  Design:  A  circuit  which  represents  the  logic  design  of  the  system  is 
developed  in  this  step  by  taking  into  consideration  speed  and  power  require- 
ments, and  electrical  behavior  of  the  components  used  in  the  development  of 
the  circuit. 

5.  Physical  Design:  This  is  the  most  time  consuming  step  in  the  VLSI  design 
cycle.  In  this  step,  the  components  and  the  interconnections  are  represented  by 
geometric  patterns.  The  objective  of  this  step  is  to  obtain  an  arrangement  of 
these  geometric  patterns  which  minimizes  the  area  and  power  and  satisfies  the 
timing  requirements  of  the  chip.  Due  to  its  high  complexity  this  step  is  broken 
down  into  smaller  sub-steps.  We  will  look  into  this  step  in  detail  later  in  this 
chapter. 

6.  Design  Verification:  In  this  step  design  rule  checking  and  circuit  extraction  are 
done  to  verify  that  the  circuit  layout  from  the  physical  design  step  satisfies  the 
system  specfication  and  design  rules. 

7.  Fabrication:  The  verified  layout  is  used  in  the  fabrication  process  to  produce 
the  chip. 

8.  Packaging,  Testing  and  Debugging:  The  fabricated  chip  is  packaged  and  tested 
to  ensure  proper  functioning.       .   . 

Each  step  in  the  design  cycle  can  be  viewed  as  a  change  in  representation  of  the 
system.  The  steps  in  the  VLSI  design  cycle  iteratively  improve  the  representation  to 
meet  the  specifications. 


1.2     Physical  Design  Automation 

The  physical  design  step  maps  a  circuit  design  into  a  physical  circuit.  The  input 
to  this  step  is  a  circuit  design  which  is  represented  by  a  set  of  modules,  a  set  of  nets, 
a  chip  carrier  and  the  design  rules.  The  modules  and  the  chip  carrier  are  usually 
rectangular.  The  output  of  the  physical  design  step  is  a  layout  for  modules  and 
interconnections  which  has  the  desired  functionality. 

There  are  several  objective  functions  that  are  used  in  the  physical  design  step. 
If  the  chip  size  is  not  fixed  then  the  objective  is  to  find  a  minimum  area  layout.  When 
the  circuit  speed  is  a  consideration,  the  objective  may  be  to  minimize  the  critical  net 
length  or  minimize  the  sum  of  connection  lengths. 

The  field  of  physical  design  automation  involves  developing  algorithms  and  data 
structures  which  can  be  used  in  the  layout  process.  The  algorithms  are  used  to  obtain 
solutions  which  satisy  the  objective  functions  and  which  meet  the  design  rules.  Large 
designs  and  the  iterative  improvements  by  the  physical  design  engineers  require  that 
the  algorithms  developed  be  very  fast. 

Physical  design  is  an  extremely  complex  process  that  is  usually  broken  down 
into  smaller  problems  such  as  partitioning,  (loorplanning  and  placement,  routing  and 
compaction. 

In  the  partitioning  step,  the  components  of  a  large  circuit  are  divided  into  a 
collection  of  smaller  subcircuits/modules  according  to  some  criteria.  The  factors  that 
are  considered  may  be  the  size  of  the  modules,  number  of  modules  and  the  number  of 


interconnections  between  the  modules.  At  the  end  of  the  partitioning  step,  we  have 
a  set  of  modules  and  a  set  of  interconnections  required  between  modules. 

Selecting  areas,  power  consumptions,  aspect  ratios,  and  I/O  pin  locations  of  the 
modules  forms  the  floorplanning  step.  The  floorplanning  step  optimizes  design  quality 
in  terms  of  chip  area,  power  consumption,  timing  performance  and  wire  density. 
Floorplanning  is  an  important  step  as  it  lays  the  foundation  for  the  final  layout. 
The  precise  locations  of  the  components  are  determined  during  the  placement  step 
to  optimize  area  and  timing. 

The  routing  phase  completes  the  interconnections  between  the  modules.  Rout- 
ing is  usually  divided  into  three  smaller  sub-problems  which  are  global  routing,  de- 
tailed routing  and  specialized  routing.  The  global  router  decomposes  a  larger  routing 
problem  into  small  and  manageable  problems.  Steiner  trees  and  spanning  trees  are 
the  commonly  used  approaches  for  net  connection  in  global  routing.  Detailed  routing 
includes  switchbox,  channel  and  planar.  Planar  routing  is  a  problem  in  which  inter- 
connection topology  of  the  nets  is  planar.  That  is,  all  connections  can  be  realized 
on  a  single  layer.  Single  layer  routing  is  not  always  possible.  In  MCM  technology 
with  many  routing  layers,  a  subset  of  nets  that  is  planar  routable  is  preferred.  Planar 
routing  is  usually  preferred  as  no  via  is  needed  for  the  interconnections.  Vias  reduce 
the  reliability  and  performance  of  a  circuit.  Routing  clock  nets  and  power-ground 
nets  are  specialized  routing  problems. 


During  the  compaction  phase,  the  components  and  the  interconnections  are 
moved  so  as  to  further  optimize  the  layout  in  terms  of  area  and  delay.  By  compress- 
ing the  chip,  the  components  come  closer  thereby  reducing  the  delay  between  the 
components.  This  step  must  also  ensure  that  by  compressing  the  chip,  design  rules 
are  not  violated. 

1.3     Thesis  Outline 

One  of  the  placement  methods  is  to  obtain  a  linear  list  of  components  mini- 
mizing some  criteria  and  then  folding  this  list  into  a  given  height  or  width  so  as  to 
minimize  the  area.  The  objective  functions  that  can  be  used  when  forming  the  linear 
list  of  components  may  be  minimizing  the  maximum  density  of  wires  between  adja- 
cent components;  minimizing  the  total  number  of  wire  segments  between  adjacent 
components;  and  minimizing  maximum  net  length  [1]. 

In  the  bit-sliced  placement  model  introduced  by  Paik  and  Sahni  [21],  compo- 
nent reordering  is  not  permitted.  That  is,  the  components  are  ordered  by  some 
objective  function  to  obtain  a  component  stack.  This  stack  is  folded  into  a  layout 
which  is  either  height-constrained  (layout  height  is  given)  or  width-constrained  (lay- 
out width  is  given).  In  Chapter  2,  we  look  into  two  problems  considered  by  Paik  and 
Sahni  [21].  We  introduce  a  normalization  technique  which  in  combination  with  the 
greedy  method  and  parametric  search  helps  develop  linear  time  algorithms  for  these 
problems. 


In  Chapter  3,  we  develop  optimal  algorithms  to  fold  a  linearly  ordered  list  of 
standard  and  custom  cells  under  various  optimization  constraints.  A  total  of  nine 
problems  are  formulated  and  their  solutions  provided. 

In  Chapter  4,  we  look  into  the  planar  topological  routability  problem.  We 
develop  a  linear  time  algorithm  for  planar  topological  routability  for  the  case  when 
all  nets  are  2-pin  nets.  This  algorithm  determines  the  topological  routability  of  the 
given  problem  instance  and  the  loose  route  of  wires  when  the  instance  is  topologically 
routable.  We  also  consider  the  case  when  there  are  multi-pin  nets.  For  this  case,  we 
prove  that  (a)  the  topological  routability  problem  is  equivalent  to  the  graph  planarity 
problem,  and  (b)  the  problem  of  finding  the  maximum  number  of  nets  that  are 
topologically  routable  is  NP-IIard.  A  linear  time  algorithm  is  developed  for  the  case 
when  the  circuit  modules  remain  connected  following  the  deletion  of  all  nets  that 
have  more  than  two  pins. 

Finally,  we  present  conclusions  and  some  future  directions  for  this  research. 
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CHAPTER  2 
FOLDING  A  STACK  OF  EQUAL  WIDTH  COMPONENTS 


2.1     Background 

Wu  and  Gajski  [31]  introduced  a  new  sliced-layout  architecture  to  alleviate  the 
problems  of  the  general  bit-sliced  layouts.  Most  fabricated  chips  can  be  described 
by  register-transfer  schematics.  In  addition  to  gates,  latches,  and  flip-flops,  schemat- 
ics include  register-transfer  components  such  as  registers,  counters,  adders,  ALUs, 
shifters,  multiplexers,  and  register  files.  Standard  cell  methodology  decomposes  the 
components  into  basic  gates,  latches,  and  flip-flops  before  layout.  Wu  and  Gajski  [31] 
suggest  that  greater  layout  density  can  be  achieved  if  register-transfer  components 
are  laid  out  in  a  bit-sliced  layout  architecture. 

For  each  microarchitectural  component  there  is  a  layout  generator  that  includes 
bit-slice  generators.  All  generated  bit  slices  are  of  the  same  width.  If  a  component  has 
a  width  w,  then  it  has  w  slices.  Each  microarchitectural  component  has  a  different 
height.  The  component  includes  the  cell-abutment,  over-the-cell  routing,  and  inter- 
slice  switch  box  to  alleviate  the  problems  of  the  previous  approaches.  Intracell  routing 
in  done  on  metal  1  and  Inter  cell  routing  on  metal  2.  All  regular  components  in  a 
design  are  stacked  (stack  of  components)  and  routed  in  metal  2.  Component  stack 
folding,  in  the  context  of  bit  sliced  architectures  introduced  by  Larmore,  Gajski,  and 


Wu  [14],  is  to  fold  this  stack  into  itself  in  a  way  that  minimizes  the  wasted  area. 
Stack  folding  (stack  partioning)  is  also  done  in  case  there  are  too  many  components 
in  a  single  stack.  In  this  paper  [14],  they  used  this  model  to  compile  layout  for  cmos 
technology.    Further  applications  of  the  model  were  considered  by  Wu  and  Gajski 

[31]. 

In  the  model  of  Larmore  et  al.  [14]  and  Wu  and  Gajski  [31]  the  component  stack 
can  be  folded  at  only  one  point.  In  addition,  it  is  possible  to  reorder  the  components 
on  the  stack.  These  folding  schemes  begin  by  reordering  the  components  by  width. 
They  also  show  that  the  folding  problem  using  this  model  is  NP-complete. 

A  related,  yet  different,  folding  model  was  considered  by  Paik  and  Sahni  [21]. 
In  this,  no  limit  is  placed  on  the  number  of  points  at  which  the  stack  may  be  folded. 
Also,  component  reordering  is  forbidden.  They  point  out  that  this  restriction  is 
realistic  as  the  component  stack  is  usually  ordered  so  as  minimize  inter  component 
routing  requirements  and  optimize  performance.  They  also  point  out  that  this  model 
may  be  used  in  the  application  cited  by  Larmore  et  al.  [14]  and  Wu  and  Gajski 
[31].  Furthermore,  it  accurately  models  the  placement  step  of  the  standard  cell  and 
sea-of-gates  layout  algorithms  of  Shragowitz  et  al.  [26,  25].  In  the  case  of  standard 
cell  designs,  all  modules  have  the  same  width  while  in  the  case  of  sea-of-gates  designs 
module  widths  and  heights  vary  from  module  to  module. 

2.2     Introduction 

A  stack  of  equal  width  components  is  comprised  of  variable  height  components 
Ci,C2,...,C„  stacked  one  on  top  of  the  other.    Ci  is  at  the  top  of  the  stack  and 
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Figure  2.1.  Stack  of  equal  width  components 
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Figure  2.2.  Routing  space  reserved 
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Table  2.1.  Summary  of  results  of  Paik  and  Sahni 


Routing  area  at  stack  ends 

No 

Yes 

Equal  width,  height  constrained 

0(n) 

0{n') 

Equal  width,  width  constrained 

0(n) 

0(n3) 

Equal  height,  height  constrained 

0{n'*  logn) 

O(n'*logn) 

Equal  height,  width  constrained 

0(71"  log^  n) 

0{n'  log'  n) 

Variable  heights  and  widths,  height  constrained 

O(nMogn) 

O(nMogn) 

Variable  heights  and  widths,  width  constrained 

O(nMog^n) 

O(nMog'n) 

(Source  :  Paik  and  Sahni    [21]) 
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C„  at  the  bottom  (Figure  2.1(a)).  If  the  stack  is  realized,  physically,  in  this  way, 
the  area  needed  is  S/i.  *  w  where  h,  >  0  \s  the  height  of  d  and  u;  >  0  is  the  width 
of  each  component.  If  the  component  stack  is  folded  at  C.  we  obtain  two  adjacent 
stacks  Ci,C2,...,a  and  C„,C„-i, . . . ,  C.+i.  The  folding  also  inverts  the  left  to 
right  orientation  of  the  components  Cn, . . .  ,C.+,.  Figure  2.1(b)  shows  the  stack  of 
Figure  2.1(a)  after  folding  at  C,,,Ci^.  Notice  that  folding  results  in  a  snake-like 
rearrangement.  While  not  apparent  from  the  figure,  each  fold  flips  the  left-to-right 
orientation  of  a  component.  As  can  be  seen  from  Figure  2.1(b),  pairs  of  folded 
stacks  may  have  nested  components,  components  in  odd  stacks  are  left  aligned;  and 
components  in  even  stacks  are  right  aligned.  The  area  of  the  folded  stack  is  the  area 
of  the  smallest  rectangle  that  bounds  the  layout.  To  determine  this,  depending  on 
the  model,  we  may  need  to  add  additional  space  at  the  stack  ends  to  allow  for  routing 
between  components  d,  and  C,,^,  where  C„  is  a  folding  point.  If  so,  let  r,-  >  0, 
2  <  i  <  n,  denote  the  height  of  the  routing  space  needed  if  the  stack  is  folded  at  C,_i 

(Figure  2.2). 

In  practical  situations,  the  height  (width)  of  the  rectangle  into  which  the  stack 
is  to  be  folded  may  be  limited  (and  known  in  advance)  and  we  are  to  minimize  the 
width  (height).  Several  versions  of  folding  into  height  (width)  constrained  rectangles 
were  considered  by  Paik  and  Sahni  [21].  Their  results  are  summarized  in  Table  1. 

In  this  chapter  we  consider  two  of  the  problems  considered  in  Paik  and  Sahni  [21]: 

(1)  Equal-width,  height- constrained  with  routing  area  at  stack  ends.  In  this  problem, 
we  are  to  fold  a  stack  of  equal  height  components  into  a  rectangle  of  given 


height  so  as  to  minimize  the  width  (and  hence  area)  of  the  rectangle.  For  this 
problem,  the  algorithm  [21]  runs  in  Oi^n^)  time.  We  develop  an  0(n)  algorithm. 

(2)  Equal-width,  width- constrained  with  routing  area  at  stack  ends.  Here  the  width 
of  the  rectangle  into  which  the  folding  occurs  is  given  and  we  are  to  mini- 
mize its  height  (and  hence  area).  Four  algorithms  with  complexity  O(nlogn), 
O(nloglogn),  O(nlog*n),  and  0(n)  respectively  are  obtained.  Experimental 
results  indicate  that  the  O(nlogn)  algorithm  is  fastest  in  practice.  This  is  due 
to  the  fact  that  this  algorithm  has  least  overhead. 

Our  algorithms  employ  two  techniques.  The  first  is  normalization  in  which  an 
input  instance  is  transformed  into  an  equivalent  normalized  instance  that  is  relatively 
easy  to  solve.  The  second  technique  is  parameterized  searching.  In  Section  2.2  we 
describe  our  normalization  technique  and  then  in  Section  2.3,  we  show  how  this 
results  in  a  linear  time  algorithm  for  the  equal-width  height-constrained  problem. 
Parameterized  searching  is  described  in  Section  2.4  and  then  used  in  Sections  2.5  to 
obtain  the  algorithms  for  the  equal-width  width-constrained  problem.  Experimental 
results  comparing  the  relative  performance  of  the  various  algorithms  for  the  equal- 
width  width-constrained  problem  are  given  in  Section  2.6. 

2.3     Normalization 

Let  hi  be  the  height  of  the  component  C,,  1  <  z  <  n.  Let  r,  be  the  routing  height 
needed  between  C,_i  and  C,  if  the  component  stack  is  folded  at  C,_i,2  <  i  <  n;  and 
let  ri  =  r„+i  =  0.  The  defined  component  stack  is  normalized  iff  the  conditions  CI 
and  C2  given  below  are  satisfied  for  every  ?',  1  <  i  <  n. 
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Figure  2.3.  Case  when  hj  +  rj+i  <  rj 


CI  :  hi  +  r,+i  >  r, 


C2  :  hi  +  r,-  >  r,+i 

An  unnormalized  instance  /  may  be  transformed  into  a  normalized  instance  / 
with  the  property  that  from  a  minimum  height  or  minimum  width  folding  of  /,  one 
can  easily  construct  a  similar  folding  for  /.  To  obtain  /,  we  identify  the  least  value  of 
i  at  which  either  Cl  or  C2  is  violated.  Let  this  value  of  i  be  j.  By  choice  of  ;,  either 

hj  +  rj+i  <  Tj,  or 
hj  +  Vj  <  r^+i . 

We  first  note  that  (since  hj  >  0)  it  is  not  possible  for  both  of  these  inequalities  to 
hold  simultaneously.  Suppose  that  hj  +  r^+i  <  r^.  Now  j  >  1  as  /ii  +  r2  >  0  while 
ri  =  0.  Also,  hj  +  rj  >  rj+i.  Consider  any  folding  of  /  in  which  Cj_i  is  a  fold 
point  (Figure  2.3(a)).  Let  the  height  of  the  stack  Si  be  h{Si)  and  that  of  S2,h{S2). 
Consider  the  folding  obtained  from  Figure  2.3(a)  by  moving  Cj  from  S2  to  ^i.  Let 
the  height  of  the  stacks  now  be  h'{Si)  and  h'{S2).  We  see  that 


h'{S,)  =  h{Si)  -  r,  +  h,  +  r,+,  <  h{Si) 

and 

h'{S2)  -  h{S2)  -r,-  hj  +  rv+i  <  h{S2). 

So,  the  height  and  width  of  the  folding  of  Figure  2.3(b)  is  no  nnore  than  that  of 
Figure  2.3(a).  Hence,  the  instance  /'  obtained  from  /  by  replacing  the  component 
pair  ((/ij_i,rj_i),(/ij,rj))  with  the  single  component  {hj^i  +  hj,rj_i)  has  the  same 
minimum  width/height  folding  as  does  /.  From  a  minimum  width/height  folding  for 
/'  one  can  obtain  one  for  /  by  replacing  the  component  {hj-i  +  hj,rj-i)  with  the  two 
components  of  /. 

If  hj  +  Tj  <  Tj+i,  then  /'  is  obtained  by  replacing  the  component  pair 
{{hj,rj),{hj+urj+i))  with  the  single  component  (hj  +  hj+ur,).  The  proof  is  similar 
to  the  previous  case. 

The  component  pair  replacement  scheme  just  described  may  be  repeated  as  often 
as  needed  to  obtain  a  normalized  instance  /.  Note  that  the  scheme  terminates  as 
each  replacement  reduces  the  number  of  components  by  one  and  every  one  instance 
component  is  normalized. 

The  preceding  discussion  leads  to  the  normalization  procedure  Normalize  of 
Figure  2.5.  The  input  to  this  procedure  is  a  component  stack  C[l] . . .  C[n]  and  the 
output  is  a  normalized  stack  C[l]...C[n]  (the  input  n  (say  n")  will  be  generally 
larger  that  the  output  n  (say  n')). 

C[i].h,  C[i].r,  C\i].f,  and  C[i].l,  respectively,  give  the  height,  routing  height  needed 
if  the  stack  is  folded  at  C[i  -  1],  index  of  first  input  component  represented  by  C[i\, 


15 


5-1 


Sr 


hj-\ 


n+i 


hj+i 


^j+i 


jj_i 


b'+i 


hj 


(a) 


(b) 


Figure  2.4.  Case  when  h-j  +  rj  <  rj+i 


Procedure  NormaHze{C ,n) 
{  Normalize  the  component  stack  C[l] . . .  C[n]] 
i  :=  1;  next  :=  2; 
while  next  <  n  +  1  do 
case 

:C[i].h  +  C[next].r<  C[i].r  : 
{Combine  with  C[i  —  1]} 
C[i  -  l].h  :=  C[i  -  l].h  +  C[i].h; 
C[i  -  1]./  :=  C[i].l; 
i  :=  i  —  I; 
:  C[i].h  +  C\i].r  <  C[next].r  : 
{Combine  with  C[nea;i]} 
C[i\.h  :=  C[i].h  +  C[next].h- 
C[i\.l  :=  C[next].l; 
next  :=  next  + 1; 
:else:  C[i  +  I]  :=  C[next]■ 
^  :=  i  +  1;  next  :=  next  +  1; 
end; 

n  :=  i  —  1; 
end;  {Normalize} 

Figure  2.5.  Normalizing  a  stack 


and  index  of  the  last  input  component  represented  by  C[i\.  At  input,  we  have 

C\i].h  =  hi 

C[i].r  =  r; 

C[i].f  =  C[i].l  =  I 
1  <  i  <  n,  and  C[n  +  \].r  =  0.  Note  that,  by  definition,  C[l].r  =  n  =  0.  On 
output,  component  C[i]  is  the  result  of  combining  together  the  input  components 
/,/  +  !,...,/.  The  heights  and  the  r  values  are  appropriately  set.  The  correctness 
of  procedure  Normalize  is  established  in  Theorem  1.  Its  complexity  is  0(n)  as  each 
iteration  of  the  while  loop  takes  constant  time;  the  first  two  case  clauses  can  be 
entered  atmost  a  total  of  n  -  1  times  as  on  each  entry  the  number  of  components  is 
reduced  by  1.  The  else  clause  can  be  entered  atmost  n  -  1  times  as  on  each  entry 
next  increases  by  1  and  this  variable  is  never  decreased  in  the  procedure. 

Theorem  1  :  Procedxire  Norvialize  produces  an  equivalent  normalized  component  stack. 

Proof   :  The  procedure  maintains  the  following  invariant  at  the  start  of  each  itera- 
tion of  the  while  loop: 
Invariant:  Normalizing  conditions  CI  and  C2  are  satisfied  by  all  components  C[j]J  < 

1. 

This  is  clearly  true  when  i  =  1  as  there  is  no  component  C[j]  with  j  <  1.  If 
the  invariant  is  true  at  the  start  of  some  iteration,  then  it  is  true  at  the  end  of  that 
iteration.  To  see  this,  note  that  if  we  enter  the  first  clause  of  the  case  then  following 
the  execution  of  this  clause,  C[j].h,C[j].r,C[j  +  l].rj  <  i\  where  i'  is  the  value  of 
i  following  execution  of  the  clause,  are  unchanged.  So,  the  execution  does  not  affect 
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CI  and  C2  for  ;  <  i'.  If  the  second  case-clause  is  entered,  then  again  Cl  and  C2 
are  unaffected  by  the  execution  for  j  <  i  as  C[j].h,C[j].r,  and  C[j  +  l].r,  j  <  i  are 
unchanged.  When  the  third  clause  is  entered  the  validity  of  Cl  and  C2  for  j  <  i' 
follows  from  the  fact  that  the  conditions  for  the  first  two  clauses  are  false. 

On  termination,  next  =  n+2.  The  last  iteration  of  the  while  loop  could  not  have 
entered  the  first  clause  of  the  case  statement  as  in  this  clause,  next  is  not  increased. 
While  in  the  second  clause,  next  is  increased,  the  condition  C\i].h  +  C[i].r  <  C[next].r 
cannot  be  true  in  the  last  iteration  as  now  next  =  n"  +  1  (n"  is  the  input  value  of 
n),  C[i].h  +  C[i].r  >  0,C[n"].r  =  0.  So,  the  last  iteration  caused  execution  of  the 
third  clause  of  the  case  statement.  As  a  result,  C[n"]  is  moved  to  position  n"  +  1  of 
C.  From  the  invariant,  it  follows  that  Cl  and  C2  are  satisfied  for  j  <  i'  =  n"  +  1 
(note  i'  is  the  final  value  of  i).  Hence  the  output  component  stack  C[l]...C[n']  is 
normalized.    □ 

Theorem  2  establishes  an  important  property  of  a  normalized  stack.  This  prop- 
erty enables  one  to  obtain  efficient  algorithms  for  the  two  folding  problems  considered 
in  this  chapter. 

Theorem  2  :    Let  (hi,ri),  \  <  i  <  n  define  a  normalized  component  stack.    Assume 
that  To  =  Tn+i  =  0.    The  following  are  true: 


I  I 

j=k  J=k-l 
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Proof    :  Direct  consequence  of  C2  and  Cl,  respectively.    □ 

Intuitively,  Theorem  2  states  that  the  height  needed  by  a  contiguous  segment 
of  components  from  a  normalized  stack  increases  when  the  segment  is  expanded  by 
adding  components  at  either  end. 

2.4     Equal-Width  Height-Constrained 

The  height  of  the  layout  is  limited  to  h  and  we  are  to  fold  the  component  stack  so 
as  to  minimize  its  width.  This  can  be  accomplished  in  linear  time  by  first  normalizing 
the  stack  and  then  using  a  greedy  strategy  to  fold  only  when  the  next  component 
cannot  be  accommodated  in  the  current  stack  segment  without  exceeding  the  height 
bound  h.  The  algorithm  is  given  in  Figure  2.6. 

From  the  correctness  of  procedure  Normalize,  it  follows  that  a  minimum  width 
folding  of  the  normalized  instance  is  also  a  minimum  width  folding  of  the  initial 
instance.  So,  we  need  only  to  show  that  the  for  loop  generates  a  minimum  width 
folding  of  the  normalized  instance  generated  by  the  procedure  Normalize.  This  follows 
from  properties  PI  and  P2  (Theorem  2)  of  a  normalized  instance.  Since  a  segment 
size  cannot  decrease  by  adding  more  components  at  either  end,  the  infeasibility  test 
is  correct.  Also,  there  can  be  no  advantage  to  postponing  the  layout  of  a  component 
to  the  next  segment  if  it  fits  in  the  current  one. 


Procedure  MinimizeWidth(C ,  n,  h,  width) 
{  Obtain  a  minimum  width  folding  whoose  height  is  atmost  h} 
Normalize{C,n); 
used  :=  h;width  :=  1; 
for  i  :=  1  to  n  do 
case 

:  used  -  C\i].r  +  C[i].h  +  C[i  +  l].r  <  h  : 
{  assign  C[i]  to  current  segment  ) 

used  :=  used  -  C\i].r  +  C[i].h  +  C[i  +  l].r; 
:C[i].r  +  C[i].h  +  C[i+\].r>  h  : 
{infeasible  instance  } 

output  error  message;  terminate; 
:else:{start  next  segment,  fold  at  C[i  —  1]  } 
width  :=  width  +  1; 
used  :=  C[i].r  +  C[i].h  +  C[i  +  l].r 
end; 
end;  {MinimizeWidth] 

Figure  2.6.  Procedure  to  obtain  a  minimum  width  folding 


m 


Table  2.2.  Comparison  of  equal-width  height-constrained  algorithms 


n 

[7] 

Figure  5 

16 

0.11 

0.05 

64 

1.80 

0.14 

256 

24.85 

0.52 

Times  are  in  milliseconds 


Note  that  while  we  are  able  to  solve  the  equal-width  height-constrained  prob- 
lem in  linear  time  using  a  combination  of  normalizing  and  the  greedy  method,  the 
algorithm  of  Paik  and  Sahni  [21]  uses  dynamic  programming  on  the  unnormalized 
instance  and  takes  0{n'^)  time.  In  Table  2,  we  give  the  observed  run  times  of  the  two 
algorithms.  These  were  obtained  by  running  C  programs  on  a  SUN  4  workstation. 
As  is  evident,  our  algorithm  is  considerably  superior  to  that  of  [21]  even  on  small 
instances. 

2.5     Parametric  Search  ; 

In  this  section,  we  provide  an  overview  of  the  parametric  search  method  of 

Frederickson  [4],  which  uses  developments  by  Frederickson  and  Johnson    [5,  6]  and 

! 

Frederickson    [3].  This  overview  has,  however,  been  tailored  to  suit  our  application 
here  and  is  not  as  general  as  that  provided  by  Frederickson  and  coworkers  [3,  4,  5,  6]. 
Assume  that  we  are  given  a  sorted  matrix  of  O(n^)  candidate  values  Mij,  1  < 
hj  ^  '^-  By  sorted,  we  mean  that 
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Mij  <  A/.,j+i,  1  <i<n,\  <j  <n 
and  Mij  <  M,+i,j,  1  <  i  <  n,  1  <  j  <  n 

The  matrix  is  provided  implicitly.  That  is,  we  are  given  a  way  to  compute  M,j, 
in  constant  time,  for  any  value  of  i  and  j.  We  are  required  to  find  the  least  Mij 
that  satisfies  some  criterion  F.  The  criterion  F  has  the  property  that  if  F{x)  is  not 
satisfied,  then  F{y)  is  not  satisfied  (i.e.,  it  is  infeasible)  for  all  y  <  x.  Similarly,  if 
F{x)  is  satisfied  (i.e.,  it  is  feasible),  then  F{y)  is  feasible  for  all  y  >  x.  In  a  parametric 
search,  the  minimum  M,_,  that  satisfies  F  is  found  by  trying  out  some  of  the  M.jS. 
As  different  M.jS  are  tried,  we  maintain  two  values  Ai  and  A2,  Xi  <  A2  with  the 
properties: 

(a)  F{\i)  is  infeasible. 

(b)  F{\2)  is  feasible. 

Initially,  Aj  =  0  and  A2  =  00  (we  assume  F  is  such  that  F(0)  is  infeasible,  F{oo) 
is  feasible,  and  Mi,  >  0  for  all  candidate  values).  To  determine  the  next  candidate 
value  to  try,  we  begin  with  the  matrix  set  S  -  {M}.  At  each  iteration,  the  matrices 
in  S  are  partitioned  into  four  equal  sized  matrices  (assume,  for  simplicity,  that  n 
is  a  power  of  2).  As  a  result  of  this,  the  size  of  S  becomes  four  times  its  previous 
size.  Next,  a  set  T  comprised  of  the  largest  and  smallest  elements  from  each  of  the 
matrices  in  5  is  constructed.  The  median  of  T  is  the  candidate  value  x  to  try  next. 
The  following  possiblities  exist  for  x  and  F{x): 

(1)  X  <  Ai.  Since  F(Ai)  is  infeasible,  F{y)  is  infeasible  for  all  y  <  Aj.   So,  F{x)  is 
infeasible. 


■f  . 
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Procedure  PSEARCH{S,\i,X2,dimensionJinish); 

repeat 

[{dimension  >  1  then  [  replace  each  matrix  in  S  by 

four  equal  sized  submatrices; 
dimension  :=  dimcnsionl2  ] 
for  i  :=  1  to  3  do 
begin 

if  dimension  =  1  then 

[  Let  T  be  the  multiset  of  values  in  all  matrices  of  S;  ] 

else 

[  Let  T  be  the  multiset  obtained  by  selecting  the  largest 
and  smallest  values  from  each  mat  rix  of  5;  ] 
X  :=  median(T); 
if  (Ai  <  a:  <  A2)  then 

if  F{x)  is  feasible  then  A2  :=  x 
else  Ai  :=  x; 
Eliminate  from  S  all  matrices  that  have  no  values 
such  that  Ai  <  x  <  A2; 
end; 
until  dimension^  *  IS*]  <  finish; 
end;  {PSEARCH} 

Figure  2.7.  Procedure  for  parametric  search 

(2)  X  >  A2.  Now,  F{x)  is  feasible. 

(3)  Ai    <   X    <    A2.     F{x)  may  be  feasible  or  infeasible.     This  is  determined  by 
computing  F{x).  If  x  is  feasible,  A2  is  set  to  x.  Otherwise,  Ai  is  set  to  x. 

Following  the  update  (if  any)  of  A,  or  A2  resulting  from  trying  out  the  candidate 
value  X,  all  matrices  in  S  that  do  not  contain  candidate  values  y  in  the  range  Ax  < 
?/  <  A2  may  be  eliminated  from  S.  , 


t"  .. 
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A  more  precise  statement  of  the  search  process  is  given  by  procedure  PSEARCH 
(Figure  2.7).  This  procedure  may  be  invoked  as  P5E/l/?C//({A/},0,oo,x,0).  dimension 
is  the  current  number  of  rows  or  columns  in  each  matrix  of  S  and  finish  is  a  stopping 
rule.  The  search  for  the  minimum  candidate  that  satisfies  F  is  terminated  when  the 
number  of  remaining  candidates  is  <  finish.  If  A2  =  00  when  P5£'/l/?Ciy terminates, 
then  none  of  the  candidate  values  is  feasible.  If  A2  is  finite,  then  it  is  the  smallest 
candidate  that  is  feasible. 

Since  we  have  assumed  n  is  a  power  of  2,  each  time  a  matrix  is  divided  into 
four,  the  submatrices  produced  are  square  and  have  dimension  that  is  also  a  power 
of  2.  Since  M  is  provided  implicitly,  each  of  its  submatrices  can  be  stored  implicitly. 
For  this,  we  need  merely  record  the  matrix  coordinates  (indices)  of  the  top  left  and 
bottom  right  elements  (actually,  the  latter  can  be  computed  from  the  former  using 
the  submatrix  dimension).  The  multiset  T  required  on  each  iteration  of  the  for  loop 
is  easy  to  construct  because  of  the  fact  that  M  is  sorted.  Note  that  since  M  is 
sorted,  all  of  its  submatrices  are  also  sorted.  Consequently,  the  largest  element  of 
each  submatix  is  in  bottom  right  corner  and  the  smallest  is  in  the  top  left  corner. 
These  elements  can  therefore  be  determined  in  constant  time  per  matrix  of  S. 

Theorem  3  :  [4]  The  number  of  feasibility  tests  F  performed  by  procedure  PSEARCH 
when  started  with  S  -  {M},  M  an  n  x  n  sorted  matrix  that  is  provided  implicitly 
is  O(\ogn)  and  the  total  time  spent  obtaining  the  candidates  for  feasibility  test  is 
0(n).  D 
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Corollary  1  :  Let  t{n)  be  the  time  needed  to  determine  if  F{x)  is  feasible.  The 
complexity  of  PSEARCH  is  O(n  +  t{n)\ogn).    □        ,  .. 

For  some  of  the  algorithms  we  describe  later,  PSEARCH  wiW  be  initiated  with 
151  >  1  (i.e.,  S  will  contain  more  than  one  M  matrix  initially;  all  matrices  in  S  will 
still  be  of  the  same  size).  To  analyze  the  complexity  of  these  algorithms,  we  shall  use 
the  following  theorem  and  corollary. 

Theorem  i  :  [4]  If  PSEARCH  is  initiated  with  S  containing  m  sorted  matrices,  each 
of  dimension  n,  then  the  number  of  feasibility  tests  is  0(\ogn)  and  the  total  time 
spent  obtaining  the  candidate  values  for  these  tests  is  0(mn).  □ 

Corollary  2  :  Let  t{n)  be  as  in  Corollary  1.  The  complexity  of  PSEARCH  under  the 
assumptions  of  Theorem  4  is  0(mn  -\r  t[n)  log  n).     □ 

While  we  have  described  PSEARCH  under  the  assumption  that  the  matrices  of 
candidate  values  are  square  and  of  dimension  a  power  of  2,  parametric  search  easily 
handles  other  matrix  shapes  and  sizes.  For  this,  we  can  add  more  rows  at  the  top  and 
columns  to  the  left  so  that  the  matrices  become  square  and  have  a  dimension  that  is 
a  power  of  2.  The  entries  in  the  new  rows  and  columns  are  0.  This  does  not  affect 
the  asymptotic  complexity  of  PSEARCH.  Alternatively,  we  can  modify  the  matrix 
splitting  process  to  partition  into  four  roughly  equal  submatrices  at  each  step.  The 
details  of  these  generalizations  are  given  in  the  literature  [3,  4,  5,  6]. 

Procedure  PSEARCH  \s  a  restricted  version  of  procedure  MSEARCH oi  [4].  An 
alternative  search  algorithm  in  which  the  for  loop  is  iterated  twice,  once  with  T 
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being  the  multiset  of  the  largest  values  in  S  and  once  with  T  being  the  multiset  of 
the  smallest  values  in  S  is  given  in  Frederickson  and  Johnson  [5,  6].  We  experimented 
with  both  the  formulations  and  found  that  for  our  stack  folding  application,  the  three 
iteration  formulation  of  Figure  2.7  is  faster  by  approximately  43%. 

2.6     Equal-Width  Width-Constrained 
To  use  parametric  search  to  determine  the  minimum  height  folding  when  the 
layout  width  is  constrained  to  be  <  to,  we  must  do  the  following: 

(1)  Identify  a  set  of  candidate  values  for  the  minimum  height  folding.  This  set  must 
be  provided  implicitly  as  a  sorted  matrix  with  the  property  that  each  matrix 
entry  can  be  computed  in  constant  time. 

(2)  Provide  a  way  to  determine  if  a  candidate  height  h  is  feasible;  i.e.,  can  the 
component  stack  be  folded  into  a  rectangle  of  height  h  and  width  w  ? 

In  this  section,  for  (1),  we  shall  provide  an  n  x  n  sorted  matrix  M  (n  is  the 
number  of  components  in  the  stack)  of  candidate  values.  For  the  feasibility  test  of  (2), 
we  can  use  procedure  Minimize  Width  of  Figure  2.6  by  setting  h  equal  to  the  candidate 
height  value  being  tested  and  then  determining  if  width  <  w  following  execution  of 
the  procedure.  Since  the  component  stack  needs  to  be  normalized  only  once  and  since 
MinimizeWidth  will  be  invoked  for  O(logn)  candidate  values,  the  call  to  Normalize 
should  be  removed  from  the  procedure  Minimize  Width  and  normalization  done  before 
the  first  invocation  of  this  procedure.  Also,  the  remaining  code  may  be  modified  to 
terminate  as  soon  as  w  folds  are  made. 
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Since  feasibility  testing  and  normalization  each  take  linear  time,  from  Corel-  *■ 

lary  1,  it  follows  that  the  complexity  of  tlic  described  parametric  search  to  find  the 
minimum  height  folding  is  0(n  +  t{n)  log  77)  ==  0(n  +  n  log  n)  =  0(n  log  n). 

To  determine  the  candidate  matrix  M,  we  observe  that  the  height  of  any  layout 
is  given  by 

J 

for  some  z,j,  1  <  i  <  j  <  n.   This  formula  just  gives  us  the  height  of  the  segment     ..        y  ■-, 
that  contains  components  C,  through  Cj.  Define  Q  to  be  the  n  x  n  matrix  with  the 
elements 


Qv  = 


0, 1  >  j 


Then  for  every  value  of  lu,  Q  contains  a  value  that  is  the  height  of  a  minimum  height 
folding  of  the  component  stack  such  that  the  folding  has  width  <  w.  From  Theorem 
2,  it  follows  that 

Q,j  <  Qi,j+\ ,  1  <  «  <  ",  1  <  ;  <  « 

Let  Mij  =  Qn-i+i,jA  <  i  <  j  <  n.  So,  M  is  a  sorted  matrix  that  contains  all 
candidate  values.   The  minimum  Mij  for  which  a  width  w  folding  is  possible  is  the 
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minimum  height  width-zu  folding.  We  now  need  to  show  how  the  elements  of  M  may 
be  computed  efficiently  given  the  index  pair  (/,j).  Let 


Hi  =  ^/ij,  1  <i  <n 


and  let  Hq  =  0.  We  see  that 


Qi:  =  { 


^.1 


and  so, 


Mi]  =  { 


r„_,+i  +  11 J  -  lln-,  +  Tj+i ,  ?  +  J  >  n  +  1 
0,  i  +  j<  n  +  1 


So,  if  we  precompute  the  //,s  each  il/,j  can  be  determined  in  constant  time.  The 
precomputation  of  the  //,s  takes  0(n)  time.  Hence,  the  overall  complexity  of  the 
parametric  search  algorithm  to  find  the  minimum  height  folding  remains  O(nlogn). 

We  note  that  our  0(n  log  n)  algorithm  is  very  similar  to  the  0(n  log  n)  algorithm 
of  [5]  to  partition  a  path  into  k  subpaths  such  that  the  length  of  the  shortest  subpath 
is  maximized.  The  differences  are  that 

(1)  We  need  to  normalize  the  component  stack  before  parametric  search  can  be 
used,  and,  ,  \  .       ■ 
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(2)  The  definition  of  M,j  needs  to  be  adjusted  to  account  for  the  routing  heights  r,- 
and  Tj+i  needed  at  either  end  of  the  stack. 

[5,  6,  3,  4]  present  several  refinements  of  the  basic  parametric  search  technique. 
These  refinements  apply  to  the  equal-width  width-constrained  problem  just  as  well 
as  to  the  path  partitioning  problem  provided  we  start  with  a  normalized  instance  and 
use  the  candidate  matrix  M  defined  above.  These  refinements  result  in  algorithms  of 
complexity  O(nloglogn),  0(nlog*n),  and  0(/^)  for  our  component  stack  problem. 

2.7     Experimental  Results 

The  four  parametric  search  algorithiiis  for  the  equal-width  height-constrained 
problem  were  programmed  in  C  and  run  on  a  SUN  4  workstation.  For  comparison 
purposes,  the  O(n^)  dynamic  programming  algorithm  of  Paik  and  Sahni  [21]  was  also 
programmed.  The  run  time  performance  of  these  five  algorithms  is  given  in  Table  3. 
These  times  represent  the  average  time  for  ten  instances  of  each  size.The  component 
heights  were  obtained  using  a  random  number  generator.  The  four  parametric  search 
algorithms  did  not  exhibit  much  run  time  variation  among  instances  with  the  same 
number  of  components.  The  algorithm  of  Paik  and  Sahni  [21]  takes  much  more  time 
than  each  of  the  parametric  search  algorithms.  Within  the  class  of  parametric  search 
algorithms,  the  O(nlogn)  one  is  fastest  in  the  tested  problem  size  range.  This  may 
be  attributed  to  the  increased  overhead  associated  with  the  remaining  algorithms. 
The  O(nlogn)  algorithm  is  recommended  for  use  in  practice  unless  the  number  of 
components  in  a  stack  is  very  much  larger  than  4096. 
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Table  2.3.  Run  times  of  equal-width  width-constrained  algorithms 


n 

7] 

O(nlogn) 

O(n!oglogn,) 

O(nlog*n) 

0(n) 

16 

4.9 

1.47 

2.28 

1.49 

1.52 

64 

314.7 

8.84 

15.75 

27.14 

26.71 

256 

23255 

45.96 

76.55 

169.58 

169.42 

4096 

- 

1041.90 

2148.60 

2597.75 

2760.25 

Times  are  in  milliseconds 


2.8     Conclusions 

We  have  shown  that  the  equal-width  height-constrained  and  equal-width  width- 
constrained  stack  folding  problems  can  be  solved  by  applying  the  greedy  method  and 
parametric  search,  respectively,  if  the  input  is  first  normalized.  Normalization  can  be 
done  in  linear  time.  Hence  the  overall  complexity  is  determined  by  that  of  applying 
the  greedy  method  or  parametric  search  to  the  normalized  data. 

We  have  developed  a  linear  time  algorithm  for  the  equal-width  height-constrained 
problem.  This  compares  very  favorably  (both  analytically  and  experimentally)  with 
the  O(n^)  dynamic  programming  algorithm  of  Paik  and  Sahni  [21]. 

For  the  equal-width  width-constrained  problem  we  have  developed  four  algo- 
rithms of  complexity  O(nlogn),  0(n  log  log  77),  O(nlog"n),  and  0(n),  respectively. 
All  compare  very  favorably  with  the  O(n^)  dynamic  programming  algorithm  of  Paik 
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d  Sahni  [21].  Experimental  results  indicate  that  the  O(nlogn)  algorithm  performs 
best  on  practical  size  instances. 


CHAPTER  3 
STANDARD  AND  CUSTOM  CELL  FOLDING 


3.1     Introduction 

Standard  cell  and  gate  array  design  stylos  are  characterized  by  a  row  (column) 
organization  of  the  layout.  The  layout  area  is  divided  into  a  number  of  parallel  rows 
separated  by  routing  channels  as  shown  in  Figure  3.1.  The  layout  problem  is  generally 
divided  into  two  independent  subtasks:  placement  and  routing.  In  the  placement  step 
the  appropriate  locations  and  orientations  of  the  standard  cells  are  decided.  In  the 
routing  step,  the  required  connections  are  added. 

One  approach  to  placement  is  linear  ordering  with  folding  [26,  13,  2].  In  this 
approach,  the  placement  is  divided  into  two  distinct  steps.  The  first  is  linear  ordering 
in  which  an  order  of  the  modules  is  determined  so  as  to  minimize  the  connection 
length  or  minimize  maximal  density  of  connections  for  modules  positioned  in  one 
line.  The  folding  step  maps  the  linear  order  into  the  row  structure  of  the  chip. 
The  linear  ordering  problem  is  NP-hard  and  heuristic  strategies  are  discussed  in  [26] 
to  minimize  the  connection  length  as  well  as  maximal  density  of  connections.  The 
greedy  strategy  is  adopted  in  [26]  for  folding  the  ordered  modules. 

In  this  paper,  we  consider  only  the  second  step  of  the  placement  approach  just 
described.     We  begin  with  an  ordered  component  list  Ci,C2,...,C„  and  develop 
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Figure  3.1.  Standard  cell  Architecture 


algorithms  to  fold  this  list  into  rows.  If  the  list  is  folded  at  C,-,  then  the  component 
d  is  in  one  row  and  C,+i  is  in  the  next.  If  the  list  is  folded  at  d  and  Cj  and 
at  no  component  Ck  for  i  <  k  <  j,  then  components  C.+i , . . . ,  Cj  are  in  the  same 
row.  Suppose  the  list  is  folded  at  C,.  The  channel  height  needed  between  the  rows 
containing  C,  and  C,+i  may  be  estimated  [8]  using  the  number  of  nets  that  have  a  pin 
in  one  of  the  components  Ci , . . . ,  C;  as  well  as  in  one  of  the  components  Ci+i, . . . ,  C„. 
Let  this  height  estimate  be  /,,  1  <  i  <  n.  Let  /„  =  0. 
We  study  the  following  folding  problems: 

1.  Standard  cell  folding  to  minimize  total  routing  channel  area  subject  to  a  chip 
width  constraint  W.  Since  each  routing  channel  has  the  same  width,  the  chip 
area  assigned  for  routing  is  minimi/od  when  the  sum  of  the  channel  heights  is 
minimized.  This  problem  is  solved  in  0(7?)  time  using  dynamic  programming 
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(Section  3.2.1).  Note  that  whenever  we  use  tlie  term  chip  area,  we  could  instead 
use  subchip  area. 

2.  Standard  cell  folding  to  minimize  chip  area  subject  to  a  chip  width  constraint 
W.  In  this  problem  both  routing  area  and  the  area  assigned  for  the  components 
is  considered.  Since  the  chip  width  is  fixed  at  W ,  area  minimization  is  equivalent 
to  minimizing  chip  height.  In  Section  3.2.1,  we  use  dynamic  programming  to 
obtain  an  0(n)  algorithm  for  this  problem.  '     /^^' 

3.  Standard  cell  folding  to  minimize  total  routing  area  subject  to  a  total  routing 
channel  height  constraint  //.  This  problem  differs  from  problem  1  only  in  that 
the  total  height  of  the  routing  channels  is  fixed  at  //,  and  their  width  is  variable 
rather  than  the  routing  channels  having  variable  total  height  and  fixed  width 
W.  In  Section  3.2.2,  we  show  how  to  solve  this  problem  in  O(nlogn)  time. 

4.  Standard  cell  folding  to  minimize  chip  area  subject  to  a  chip  height  constraint 
//.  This  problem  is  solved  in  0(n  log  n)  time  in  Section  3.2.2.  .,     ^^ 

5.  Standard  cell  folding  using  equal  height  channels  of  width  W.  We  are  to  find 
a  folding  that  uses  channels  of  minimum  height.  Among  all  such  foldings,  one 
that  uses  the  fewest  number  of  routing  chaimcls  (and  hence  fewest  number  of 
component  rows)  is  to  be  found.  In  Section  3.3.1,  we  develop  an  O(nlogn)  ex- 
pected time  algorithm  for  this  problem.  However,  for  most  practical  instances, 
the  algorithm  has  run  time  0(n). 
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6.  Standard  cell  folding  using  equal  height  routing  channels  of  width  W.  Find  a  ;• 

.-■-.  •'       ■>      -jl 

folding  that  minimizes  the  total  chip  area.  This  can  be  done  in  ©(n^)  time  (see 
Section  3.3.2).  .  . 

7.  Standard  cell  folding  using  equal  height  channels  and  a  chip  of  height  H.  The 
folding  should  minimize  the  total  chip  area.    Our  algorithm  for  this  problem 

can  be  found  in  Section  3.3.3.  Its  complexity  is  0{n'^).  /^ 

8.  Custom  cell  folding  to  minimize  total  chip  area  subject  to  a  chip  width  W. 
Note  that  in  standard  cell  layout,  all  cclls/components/modules  have  the  same 
height  and  may  have  variable  widtlis.  In  custom  cell  layout,  the  cells  may  differ 
in  both  height  and  width.  We  assume  that  the  cell  row  height  is  set  to  be  the 
height  of  the  tallest  cell  assigned  to  thai  row.   In  Section  3.4.1,  we  develop  an 

0(n log n)  algorithm  for  this  problem.  '    •^ 

9.  Custom  cell  folding  to  minimize  total  chip  area  subject  to  a  chip  height  con- 
straint //.  We  solve  this  problem  in  Section  3.4.2  using  an  algorithm  of  com- 
plexity O(nlog^n). 

We  note  that  problem  8  has  been  studied  previously  in  [21]  in  the  context  of  bit 
slice  stack  folding.  The  algorithm  developed  there  has  complexity  O(n^)  while  ours 
has  complexity  O(nlogn).  Problem  9  has  also  been  studied  in  [21].  Our  O(nlog^n) 
algorithm  is  an  improvement  over  the  0{n^\ogn)  algorithm  developed  in  [21]. 
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■'^■2  Standard  Cell  Folding;  f Problems  1-4) 
Our  discussion  of  problems  1-4  is  divided  into  two  parts.  In  Section  3.2.1,  we 
ider  problems  1  and  2.  In  both  these,  the  chip  width  and  hence  the  cell  and 
routing  channel  widths  are  fixed  at  W .  In  Section  3.2.2,  we  consider  problems  3  and 
4  in  both  of  which  the  chip  height  is  fixed  at  //.  In  all  four  problems,  the  routing 
channels  have  variable  height.  Each  cell  and  hence  each  cell  row  has  height  h.  The 
width  of  cell  i  is  u;.,  l<i<  n.  Let  u;.,  =  E[=.  "^fc,  1  <  ^'  <  J  <  »•  I"  case  of  fixed 
chip  width  W ,  we  may  assume  that  to,-  <  IK,  1  <  ?  <  n. 

3.2.1     Width  Constrained  Case  (Problems  1  and  2) 

We  first  consider  problem  1.  In  this,  we  are  to  minimize  the  total  routing  area. 
Since  the  channel  widths  are  fixed  at  \\\  it  is  sufficient  to  minimize  the  sum  of 
channel  heights.  Suppose  that  Ci,...,C„  is  folded  at  d  in  an  optimal  folding  X. 
Then  the  folding  of  Ci , . . . ,  C.  in  X  as  well  as  that  of  C,+i , . . . ,  C„  must  be  minimum 
area  foldings.     Hence,  the  principle  of  optimality  holds  and  we  can  use  dynamic 

programming  [10]. 

Let  /(i,5),  i  <  5,  denote  the  minimum  sum  of  channel  heights  when  the  com- 
ponent list  Ci, . . . ,  C„  is  folded  such  that  C„  . . . ,  C,  are  in  one  cell  row  and  the  first 
fold  is  at  Cs  (so,  Cs+i  is  in  the  next  cell  row).  It  is  easy  to  see  that  f{n,n)  =  l„-0. 
For  1  <  i  <  5  <  n,  we  get 


fihs)  =  < 


oo  if  t"ts  >  I'K 

(3.1) 

f{i  +  \,s)    otherwise 


3t) 


Also,  for  1  <  i  =  s  <  n,  we  get 


f{i,i)=  min  {/(/:  +  !,  <•/)  +  /,}  (3.2) 

1  <  (J  <  n 


The  solution  to  problem  1  is  obtained  by  first  using  Equations  3.1  and  3.2  to 
determine  f{i,s),  \  <  i  <  s  <  n  and  then  determining  the  minimum  of  f{l,j), 
I  <  j  <n.  The  w„'s  may  be  precomputed  in  O(n^)  time.  Each  f{i,s),  i  <  s  takes 
0(1)  time  to  compute  and  f{i,i)  takes  0(?t  -  i)  time.  Hence,  all  the  /(i,5)'s,  i  <  s 
may  be  obtained  in  0{n'^)  time.  The  minimum  of  the  /(l,i)'s  can  be  obtained  in 
0(n)  time.  So  the  overall  time  needed  to  solve  problem  1  using  Equations  3.1  and 
3.2  is  0(n2). 

A  more  careful  implementation  of  the  dynamic  programming  algorithm  results 
in  a  complexity  0(n).  First  we  compute  the  sufTix  sums 

n 

in  0(n)  time.    Let  Qn+i  =  0.    From  the  suffix  sums,  each  Wis  can  be  computed  in 
0(1)  time  using 

lOis  =  Qi  -  Qs+\ 

Next,  from  Equation  3.1  we  see  that  for  i  <  s  and  i/;„  <  W: 

f(i^s)  =  f{i  +  \,s)  =  f{i  +  2,s)  =  ...  =  f{s,s)  =  F{s) 
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So,  Equation  3.1  becomes  (for  i  <  s) 


f{hs)  = 


F{s)     otherwise 


(3.3) 


Using  Equation  3.3,  Equation  3.2  may  be  rewritten  as: 


Fli)  =  f{i,t)     =      mw{f{i+\,q)  +  h} 

t<q<n 

min  {F{q)  +  h} 

i<q<n   and     ^ti,^.},q<\V 


i<7<n  and      "'i  +  l,<)S'^ 


The  minimum  total  routing  height  needed  is 


min  .{F(?)}  (3-5) 

l<i<n  and     ni],  <  IV 


So,  problem  1  may  be  solved  by  computing  the  n  F(z)'s  using  Equation  3.4 
(rather  than  the  0{n^)  f{i,  5)'s  using  Equations  3.1  and  3.2)  and  finding  the  minimum 
of  0(n)  F(z)'s  in  Equation  3.5.  To  compute  the  F(?:)'s  using  Equation  3.4,  we  begin 
with  F(n)  =  0  and  compute  F{n  -  1),  F{n  -  2), .  ..,F{1),  in  that  order.  To  compute 
an  F{i)  we  need  to  find  the  minimum  of  a  multiset  5,-  of  previously  computed  F's. 

Specifically, 

Si  =  {F{q)  \l<  q  <  n  Kud    to.+i.,  <  W^} 
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Observation  1  :  Ifw^+i,,  >  W,  then  «;;+!,<,  >  W  for  i  <  j.  Hence,  if  F{q)  ^  Sj,  then 
F{q)^Sifori<j.     □ 

From  Observation  1,  it  follows  that  Si-i  may  be  computed  from  ^v,  1  <  i  <  n 
by  eliminating  those  F(g)'s  for  which  u;,,,  >  W  and  adding  in  F{i)  (note  that,  by 
assumption,  u),  =  wu  <  W). 

Lemma  1  :  If  F{a)  G  Si,  F(h)  G  Si,  F{a)  <  F{b),  a  <  b,  then  we  may  eliminate  F{b) 
from  Si  and  continue  to  compute  5i_i,  S',-2,  •  •  • ,  5i  as  described  above.  This  does  not 
affect  the  values  of  F{i  -  1), ... ,  F{1). 

Proof   :  Note  that  F(;),  ;  <  i  is  being  computed  using  the  equation 


F{j)  =  h+    min  {  F{q)  } 


If  F{b)  is  eliminated  from  5.,  the  value  of  F{i)  is  unaffected  as  F{a)  <  F{b).  If  F{a) 
is  eliminated  from  Sj,  j  <  i  because  w,+i,a  >  W,  then  F{b)  would  also  be  eliminated 
a.s  a  <  b  and  so  Wj+^^b  >  t/^j+i.a  >  W.  If  F{a)  is  eliminated  because  there  is  a 
F(c)  <  F(a),  c<a,  then  so  also  will  be  F{b)  be  eliminated  as  F{c)  <  F{a)  <  F{b) 
and  c  <  a  <  b.     O         ,  » 

Observation  1  and  Lemma  1  motivate  us  to  maintain  5  as  a  sequential  queue 
[11]  in  an  array  Result[\..n].  Result[i].q  and  Result[i].F  together  represent  an  entry 
of  S  yielding  the  value  F{q).  The  elements  of  S  are  stored  in  positions  tail,  tail + 
1,  •  •  •,head  of  array  Result.    The  F{q)  values  are  in  descending  order  left-to-right. 


Hence,  the  q  values  are  in  ascending  order.  Procedure  MinimizeHtStandard  (Figure 
3.2)  is  the  resulting  algorithm.  ,  ^       •  , 

Theorem  5  :   The  procedure  MinimizeHtStandard  given  in  Figure  3.2  is  correct. 

Proof  :  There  are  two  parts  to  the  working  of  procedure  MinimizeHtStandard.  The 
first  one  is  computing  F{i\  in  which  deletions  of  F(.)'s  can  occur.  The  second  one 
is  inserting  the  computed  F{i)  at  the  appropriate  place  in  the  array. 

The  procedure  maintains  the  following  invariant  at  the  start  of  each  iteration  of 

the  for  loop. 

Invariant:  Rtsult[taH].F  >  Result[tail  +  l].F  >...>  Result[head\.F 
It  is  clearly  true  when  z  =  n  -  1  as  head  —  tail. 

The  invariant  is  true  at  the  start  of  the  iteration  and  so  Result[head\.F  is 
the  minimum  maintained  F(.)  value.  The  component  number  is  maintained  in 
Result[head\.q.  We  check  whether  Q[t  +  1]  -  Q[Result[head\.q  +  1]  >  W  and  if 
so  by  virtue  of  Observation  1,  we  can  eliminate  this  value.  We  do  so  by  decrementing 
the  head  pointer.  We  keep  repeating  this  until  we  find  a  record  k  =  Result[head\.q 
such  that  BR[i  +  1]  -  BR[k  +  1]  <  W.  This  record  pointed  to  by  head  has  the  min- 
imum of  the  maintained  F(.)  values.  We  compute  F{i)  and  store  it  in  temp.  Notice 
that  at  the  end  of  the  while  loop,  we  have  deleted  a  few  F(.)'s  and  the  invariant 

property  still  holds. 

The  invariant  holds  before  the  start  of  the  second  while  loop.  Here  we  start 
at  tail.  If  the  inequality  is  true,  then  we  delete  the  record  and  this  is  justified  by 
Lemma  1.   We  keep  doing  so  until  temp  >  Result[tail].F.  Then  we  decrement  the 
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Procedure  MinimizeHtStandard; 
{  Compute  the  minimum  height  layout) 
{  Initialize  S„  =  {F{n)  =  0  }  } 
head  :=  n;  tail  :=  n; 
Result[tail].F  :=  0;  Result[tail].q  =  n; 
{  Compute  F(i)  } 
for  i  :=  n  —  I  downto  1  do 
begin 

{  Compute  Si  } 

while  {Q[i  +  1]  -  Q[ResuH[head\.q  +  1]  >  W)  do 

head  :=  /leaJ  -  1;  {delete  from  S'i+i,  Observation  1  } 
temp  :=  l[i]  +  Result[head].F;  {Use  min  F  in  ^^  to  compute  F{i)} 
while  (iemp  <  i?esu/i[iai/].F)  do    {  delete  using  Lemma  1  } 
tail  :=  tail  +  1; 
{  Store  F{i)  } 
tail  :=  tail  —  1; 
Result[tail].F  :=  temp; 
Result[tail].q  :=  i; 
end;  {  of  for  } 
while  {Q[\]  -  Q[Result[head\.q  +  I]  >  W)  do 

head  :=  head  —  1; 
MinimizeHtStandard  :=  Result[head].F; 
end;  {  of  MinimizeHtStandard  } 

Figure  3.2.  Procedure  to  obtain  a  minimum  height  folding 
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tail  pointer  and  store  the  temp  record.  So,  the  invariant  holds  at  the  end  of  the 
iteration.  Consequently,  the  invariant  holds  at  the  start  of  each  iteration  of  the  for 
loop  and  the  F's  are  correctly  computed. 

The  minimum  height  layout  is  the  minimum  of  the  maintained  F{.)  values  that 
satisfy  the  width  constraint,  i.e  Q{1]  -  Q[Result[head\.q  +  1]  <  W.  The  last  while 
loop  of  the  procedure  take  care  of  this  fact.  The  last  line  of  procedure  computes  the 
MinimizeHtStandard,  which  is  the  minimum  height  layout.     □ 

Whenever  the  pointers  head  or  tail  are  advanced  in  the  while  loops,  we  delete 
F(.)  values.  This  cost  can  be  charged  towards  deletion  of  F{.)  values.  The  remaining 
code  within  the  for  loop  takes  0(n)  amortized  time.  The  complexity  of  the  procedure 
MinimizeHtStandard  is  clearly  0(n)  as  no  more  than  n  deletions  can  take  place. 
Using  standard  dynamic  programming  traceback  techniques  [10],  the  fold  points  can 
be  obtained  in  additional  0(n)  time. 

Problem  2,  i.e,  minimize  total  area  rather  than  just  routing  area  may  be  done 
in  a  similar  way.  Let  f{i,s),i  <  s  now  denote  the  minimum  chip  height  for  the 
component  list  C,, . . . ,  Cn  assuming  the  first  fold  is  at  s.  As  before  f{n,n)  =  0  and 
Equation  3.1  holds  for  i  <  s.  Equation  3.2  needs  to  be  replaced  by 


f{i,i)=  min{/(i  + 1,(7) +  /.  +  /»}  (3.6) 

i<q<n 


Using  Equations  3.1  and  3.6  and  the  development  for  problem  1,  an  0(n)  time 
algorithm  for  problem  2  may  be  obtained. 
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3.2.2     Height  Constrained  Case  (Problems  3-4) 

The  solutions  to  problems  3  and  4  are  similar.  Both  use  parametric  search  and 
we  describe  only  the  solution  to  problem  3.  Since  the  total  height  of  the  routing 
channels  is  fixed  at  H,  the  area  assigned  for  routing  is  minimized  by  minimizing  the 
chip  width  W.  To  use  parametric  search  to  minimize  W,  we  must  do  the  following: 

1.  Identify  a  set  of  candidate  values  for  the  minimum  W.  This  set  must  be  pro- 
vided as  a  sorted  matrix  with  the  property  that  each  matrix  entry  can  be 
computed  in  constant  time. 

2.  Provide  a  way  to  determine  if  a  candidate  width  W  is  feasible,  i.e,  can  the 
component  stack  can  be  folded  using  total  channel  height  H  and  width  W  ? 

For  the  feasibility  test  of  2,  we  can  use  procedure  Minimize HtStandardoi  Figure  3.2  by 
setting  W  to  the  candidate  value  being  tested  and  then  determine  if  MinimumHtStandard  < 
H  following  the  execution  of  the  procedure. 

Next,  we  provide  an  n  x  n  sorted  matrix  M  (n  is  the  total  number  of  components 
in  the  component  list)  of  candidate  values.  To  determine  the  candidate  matrix  M,  we 
observe  that  the  width  of  any  layout  is  given  by  Yl^^i  ^i  for  some  i,  j,  1  <  «  <  i  <  ". 
This  formula  gives  us  the  width  of  the  segment  that  contains  components  d  throught 
Cy  M  is  a  sorted  matrix  that  contains  all  candidate  values.  The  minimum  Mij  for 
which  a  height  //  folding  is  possible  is  the  minimum  width  height-//  folding.  We  now 
show  how  the  elements  of  M  may  be  computed  efRciently  given  the  index  pair  (i,i). 


># 


Let 


and  let  Tn+i  =  0.  Then, 


to 


r.  =  £?i;„l  <i< 


n 


j=» 


Mij  =  < 


0,i  +  j  <n  +  l 


So,  if  we  precompute  the  T^'s  each  Mij  can  be  determined  in  constant  time. 
The  precomputation  of  the  T.'s  takes  0(n)  time.  Since  feasibility  testing  takes  linear 
time,  from  Corollary  2,  it  follows  that  the  complexity  of  the  described  parametric 
search  to  find  the  minimum  width  folding  is  0(n  +  t{n)\ogn)  =  0(n  +  nlogn)  = 


O(nlogn). 


3.3     Standard  Cell  Folding  f Problems  5-71 


In  this  section,  we  deal  with  layouts  which  have  fixed  channel  area,  e.g,  semi- 
custom  chips  in  which  each  routing  channel  is  of  the  same  height. 

3.3.1     Minimum  Channel  Height  (Problem  5)  .    ; 

We  may  view  the  result  of  any  width  W  folding  eis  the  transformation  of  the 
component  list  Ci,...,C„  into  a  new  component  list  Bi,...,Bk,  k  <  n  where  5, 
represents  the  components  folded  into  row  i  of  the  layout.  The  width  of  each  fi, 
equals  the  sum  of  the  widths  of  the  components  assigned  to  cell  row  i  and  this  is 


<  W.  Also,  the  routing  channel  between  rows  j  and  i  +  1  must  have  height  at  least 
equal  to  /j.  where  Q.  is  the  last  component  assigned  to  cell  row  i.  We  see  that 


width{Bi)  =      53 


and  height  of  channel(0  >  Iji 
where  jo  =  0.    When  channel  heights  are  the  same,  the  height  must  be  at  least 

maxi<i<m{/j,}. 

With  this  knowledge,  we  can  develop  a  greedy  algorithm  to  minimize  channel 
height.  In  this,  we  repeatedly  combine  together  pairs  of  components  (this  is  equivalent 
to  assigning  them  to  the  same  cell  row  or  Bi)  so  that  no  created  component  has  width 
greater  than  W.  The  pairs  are  chosen  in  non-increasing  order  of  /,.  The  greedy 
algorithm  is  given  in  Figure  3.3.  Each  set  of  combined  components  is  represented  by 
a  pointer,  last,  from  the  first  component  to  the  last  and  another  pointer,  first,  from 
the  last  component  to  the  first.  The  width  of  the  combined  component  is  kept  in  the 
first  elementary  component  of  the  combined  component. 

In  the  algorithm  of  Figure  3.3,  we  initialize  the  combined  component  blocks  to 
consist  of  elementary  components  in  the  first  for  loop.  The  sort  gives  us  the  order 
in  which  the  /'s  are  to  be  "eliminated"  so  that  the  maximum  of  the  remaining  /'s 
is  the  minimum.  In  the  while  loop  Vs  are  eliminated  by  combining  blocks.  This 
is  done  until  the  next  highest  /  (we  assume  that  ^Wi  >  W  so  it  is  not  possible  to 
eliminate  all  /'s).  The  highest  remaining  /  is  l[p[i]]  and  this  is  the  smallest  channel 
height  needed. 
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Procedure  MinChannelHeight; 

for  i  :=  1  to  n  do  {intialize  component  blocks  } 
begin 

first[i]  :=  i;  last[i]  :—  i; 
end; 

Sort  p[l..n]  =  [1, 2, . . . ,  n]  so  that 
l\p\i]]  >  l[p[z  +  1]],  l<t<n 

while(  width[first[p[i]]]  +  width[p[i]  +  1]  <  I^  )  do 
begin 

width[first[p[{]]]  :=  wtdth[firsi[p\i]]]  +  iuidth[p[i]  +  1]; 

first[last[p[i]  +  I]]  :=  first[p\i]]; 

last[first[p[i]]]  :=  last[p[i\  +  1]; 

i  :=  i  +  1; 
end; 

MinChannelHeight:—  l[p[i]]'i 
end; 

Figure  3.3.  Procedure  to  obtain  a  minimum  cliannel  lieight  folding 
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The  correctness  of  the  procedure  is  easily  established.  For  its  complexity,  we 
see  that  except  for  the  sort  step,  the  others  take  0{n)  time.  The  sort  can  be  done 
in  O(nlogn)  time.  However,  in  practice,  max{/,}  -  min{/.}  =  0(n)  and  the  sort 
can  be  done  in  0(n)  time  using  a  radix  sort  with  radix  0(n)  (i.e.,  a  bin  sort)  [11]. 
One  may  also  verify  that  the  minimum  number  of  cell  rows  needed  is  obtained  by 
doing  a  greedy  folding  on  the  combined  components  that  remain  when  procedure 
MinChannelH  eight. 
3.3.2     Minimize  Chip  Area  Subject  to  Width  Constraint  fProblem  6) 

First,  consider  a  modified  version  of  problem  6  in  which  in  addition  to  the  chip 
width  ly,  we  are  given  the  height  L  of  each  routing  channel.  We  are  to  fold  the 
components  so  as  to  minimize  the  total  chip  area.  To  solve  modified  problem  6  in 
linear  time,  we  first  make  a  pass  over  all  the  components  and  combine  components  C, 
and  Ci+i  if  /,  >  L.  If  any  component  that  results  has  width  >  W ,  L  \s  an  infeasible 
channel  height.  Following  the  combining  of  blocks  in  this  way,  the  resulting  blocks 
are  packed  into  cell  rows  in  a  greedy  manner  (i.e.,  a  new  cell  row  is  started  only  if 
the  component  being  placed  does  not  fit  in  the  current  cell  row).  The  fact  that  this 
minimizes  the  number  of  cell  rows  and  hence  chip  area  is  easily  verified. 

Problem  6  can  be  solved  using  the  solution  to  modified  problem  6  by  trying 
out  all  0(n)  possible  values  for  L  (i.e.,  the  distinct  Ws)  and  seeing  which  minimizes 
overall  area.  (Actually  only  /,'s  that  are  no  less  than  the  minimum  feasible  L  as 
determined  by  problem  5  need  be  tried).  The  resulting  complexity  is  O(n^).  ,^ 
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3.3.3     Minimize  Chip  Area  Subject  to  IleJE^ht  Constraint  (Problem  71 

As  for  problem  6,  we  define  a  modified  problem  7  in  which  the  channel  height  L 
is  known.  This  modified  problem  is  solved  using  parametric  search.  The  candidate 
values  are  described  by  the  same  M  matrix  as  used  in  Section  3.2.2.  The  solution 
to  modified  problem  6  is  used  for  the  feasibility  test.  This  enables  us  to  solve  the 
modified  version  of  problem  7  in  0(n  log  n)  time.  Now,  by  trying  out  all  0(n)  possible 
L  values  (as  in  Section  3.3.2)  the  minimum  area  folding  can  be  determined.  The 
overall  time  complexity  is  O(n^logn). 

3.4     Custom  Cell  Folding  (Problems  8  and  9) 

In  this  section,  we  relax  the  requirement  that  all  components  have  the  same 
height  h.  Let  hi  be  the  height  of  Cj.  If  C,,  •  •  • ,  Cj  are  assigned  to  the  same  cell  row 
and  no  other  components  are  assigned  to  this  row,  then  the  cell  row  height  is 


max{/?.,} 

t<q<3 


The  height  of  the  folding  is  the  sum  of  the  heights  of  the  cell  rows  and  routing 

channels. 

3.4.1     Width  Constrained  Folding  fProblem  S) 

Since  the  chip  width  is  fixed  at  W,  chip  area  is  minimized  by  minimizing  chip 
height.  Let  Rij  =  max,<,<j  {/)„},  1  <  i  <  j  <  n.  Let  f{i,s),  i  <  s  he  the  minimum  /■^ 

height  into  which  C., . . . ,  C„  can  be  folded  such  that  the  first  fold  is  at  C,.  Following  ■^ 

■:^| 
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Procedure  MinimizelftCustom; 

{  Compute  the  minimum  height  folding} 

head  :=  n;  tail :-  n;  left  :-  n;  right  :=  n; 
for  i  :=  1  to  n  do 

Hlist[i].gvalue  :=  oo; 
Flist[tail].q  :=  n;  Flist[tail].F  =  0; 
Hlist[n].top  :=  tail;  Hlist[n].bottoin  :=  <az7; 
Hlist[n].hvalue  :=  /i[n]; 

///is<[n].(7i;rt/we  :=  Hlist[n].hvalue  +  Flist[Hlist[n].top].F; 
InitializeWinnerTree(T); 
for  i  :=  n  —  I  downto  1  do 
begin 

DeleteVahe(i); 
InsertValue(i); 
end; {of  for  } 
DeleteValue(0); 

MinimizeHtCustom  :=  Winner  of  the  Tree  T;  ■       . 

end;  {  of  MinimizeHtCustom  } 

Figure  3.4.  Procedure  to  obtain  a  minimum  height  folding  for  custom  cells 


the  development  of  Section  3.2.1,  we  see  that  f{n,n)  =  /i„,  and  for  i  <  s, 


oo  if  Wis  >W 

f{t,s)  =  {  (3-7) 

f{s,s)  +  Ris-  hs,    otherwise  • 


and  for  i  =  s,  .  ,  :   • 

f(i,i)  =  min    {  f{i  +  l,q)  +  h,  +  /.}  (3.8) 

i<q<n 

The  minimum  height  into  which  the  folding  can  be  done  is  mini<,<n  {/(I,?)}- 
As  described  in  Section  3.2.1,  the  set  of  dynamic  programming  equations  can  be 
solved  in  O(n^)  time.  However,  the  development  of  Section  3.2.1,  that  results  in  an 
0(n)  time  solution  does  not  apply  to  the  new  set  of  equations.  Instead,  we  are  able 
to  solve  problem  8  in  O(nlogn)  time. 

Define  F{i)  =  f{i,i)  -  /?,.  Substituting  into  Equation  3.7,  we  get 


fi^,^)  = 


From  Equation  3.8,  we  get 


F{i)  =  f{t,i)-h,     =     mm  {f{i  +  hq)]  +  li 

t<q<n 


OO  if  w,s  >  W 

F{s)  +  Ris,    otherwise 


(3.9) 


=    /,  +  min{/(i+l,?;  +  l),    min    {/(i  +  l,g)}} 

i+\<q<n 

=     h  +  mm{Fii+\)  +  hi+u  min  {F{q)  +  Ri+i,g}} 

=    /.+  min      ^  I F(<7) +  /?.+!,,}  (3-10) 


ou 


The  height  of  the  minimum  height  folding  is 


min        {F{i)  +  Ru}  (3.11) 


Beginning  with  F{n)  =  f{n,n)  -  /i„  =  0,  the  remaining  F's  may  be  computed, 
in  the  order  F{n  -  1),  •  •  • ,  F(l),  by  using  Equation  3.10.  To  use  Equation  3.10,  we 
keep  a  multiset  Si  of  F  values  as  in  Section  3.2.1.  We  begin  with  5n  =  {F{n)}  and 
rewrite  Equation  3.10  as  : 


Fii)  =  h+    min  {F((?) +  /?,+!.,}  (3.12) 

F{q)es, 


Observation  1  of  Section  3.2.1  applies  to  Equation  3.12  and  we  may  eliminate 
from  Si  any  F{q)  for  which  u;,+i,,  >  W. 

Observation  2  :  /?,,,  >  i?,,,-i  >  •  •  •  >  Rr.t,  1  <i  <q<n. 

Using  Observation  2  and  Equation  3.12  we  can  show  that  Lemma  1  applies  for 
the  computation  of  the  F's  as  defined  in  this  section. 

Observation  3  :    If  hj  >  /i,  and  i  <  j  <  q,  then  Rig  ^  hg.    Also,  if  hj  >  hj+i  and 
i  <  j  then  Rij  =  Rij+'i  • 

Now,  we  devise  a  method  to  find  the  minimum  in  Equation  3.12  efficiently.  We 
store  the  F(.)  values  in  an  array  of  records  called  Flist.  Each  Flist  record  has  two 
fields,  Flist.q  and  Flist. F.  Flist. F  =  F{Flist.q),  ie,  say  F(8)  =  50  then  there  is  a 
record  which  has  Flist.q  =  8  and  Flist. F  =  50.    There  are  two  pointers,  head  and 
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Procedure  DeleteValue(i); 

{  Delete  F{1)  such  that  Wi+i,t  >  W  ] 

done:=fa\se;  bool:=  false; 

while  (not  done)  do 

if  (g[i  4.  1]  _  Q[Flist[niist[righi].top].q^  1]  >  W)  then 

{Delete  this  F(.)  value} 

Hlist[right].top=  Hlist[right].top-  1; 
/leac/  =  head  —  1; 
600/  :=  true; 

if  {Hlist[right].top  <  IHist[righi\.hoUom)  then 
{  Make  this  record  inactive  } 
Hlist[right].gvalue  :=  00; 
AdjustWinnerTreefT ,  right)] 
right  :=  right  —  1;  hool  :—  false; 
end;{of  if} 
else  done  :—  true; 
end;{of  if} 
end;{of  while} 
if  bool  then 

Hlist[righi\.gvalue  :=  Hlist[right].hvnlue-\-  Flist{Hlist[righi\.top].F; 

AdjustWinnerTreefT ,  right); 
end;{ofif} 
end;  {  of  DeleteValue  } 

Figure  3.5.  Procedure  to  delete  F(.)  values  as  in  Observation  3 
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tail  that  are  used.  Initially,  head  =  tail  =  n.  At  any  point,  the  head  and  tail  have 
values  such  that  head  >  tail  and  F{tail)  >  F{tail  +!)>...>  F{head).  This  data 
structure  is  same  as  the  one  used  in  Section  3.2.1. 

When  computing  F{i)  we  need  to  associate  F{q)  values  with  i^^+i,,  values  and 
then  generate  values  F{q)  +  Ri+\,q,  and  find  the  minimum  of  these  values.  Suppose 
hg  >  hg+i  then  7?.,  =  7?,,,+i  from  Observation  3.  Associate  the  values  F{q)  and 
F{q  +  1)  with  Ri,q  in  this  case.  In  general,  if  R,^g  =  i?,,,+i  =  Ri,q+2  =  ...  =  Ri,h 
then  we  have  a  single  Hlist  record  with  /?,  value  and  associated  with  it  the  values 
F{q),F{q+  \),...,F{1).  Note  that  the  F{.)  values  must  satisfy  the  condition  : 
F{q)  >  F{q  +  1) . . .  >  F{1).  Otherwise  the  F{.)  values  which  violate  the  condition 
can  be  removed  as  in  Lemma  1  by  doing  a  left  to  right  scan.  We  use  an  array  of 
records  Hlist  of  size  n  with  fields  Hlist. hvalue  representing  the  height,  Hlist.top  and 
Hlist.bottom  the  two  pointers  which  keep  track  of  the  F{.)  values  associated  with  this 
record.  The  top  and  bottom  pointers  point  to  the  F{.)  values  satisfying  the  condition: 
Flist[Hlist.top].F  <  Flist[H list. top-  \].F  <  ...  <  Flist[Hlist.bottom].F.  That  is, 
Flist\H list. top]. F  is  the  minimum  F{.)  value  associated  with  this  record.  Note  that 
every  F{.)  value  is  associated  with  a  unique  Hlist  record.  We  generate  the  value 
Flist[Hlist.top].F+  Hlist.hvalue  (which  is  F{q)  +  /?,,,)  and  store  it  in  Hlist.gvalue 
(generated  value).  These  generated  values  are  used  to  construct  a  winner  tree  T  (see 
Horowitz  and  Sahni  [11]). 

The  winner  of  the  tree  T  is  the  minimum  we  are  looking  for  when  computing 
F{i).  Let  a  Hlist  record  be  active  if  Hlist.gvalue  ^  oo.  The  pointers  left  and  right, 
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left  <  right,  are  used  to  point  to  the  currently  active  list  of  Hlist  records.  Hlist[left] 
is  the  leftmost  active  record  and  Hlist[right]  is  the  rightmost  active  record. 

The  procedure  MinimizelltCustom  is  given  in  Figure  3.4.  The  pointers  are 
initialized  and  the  winner  tree  T  initialized.  In  the  procedure  DeleteValue(i),  the 
F{1)  values  that  satisfy  the  conditions  in  Observation  1,  i.e.,  F{1)  values  such  that 
ty,+i,;  >  W  are  deleted.  Let  Q[j]  -  lOj  +  ?/.'j+i  +  . . .  +  lOn. 

The  procedure  DeleteValue  is  given  in  Figure  3.5.  The  boolean  bool  keeps  track 
of  whether  a  Hlist  record  has  been  made  inactive.  If  so,  it  moves  the  pointer  right 
to  left  to  point  to  an  active  Hlist  record.  Also,  the  winner  tree  T  is  adjusted  to 
update  the  current  minimum.  The  call  to  function  AdjustWinnerTree  takes  O(logn) 
time  [11].  Note  that  the  winner  tree  T  is  adjusted  a  maximum  of  two  times  whenever 
an  F{.)  value  is  deleted.  Let  the  number  of  deletes  when  DeleteValue  is  invoked  be 
X.  Then,  the  time  complexity  of  DeleteValue  is  0(.Tlogn). 

The  procedure  InsertValuefi)  first  finds  the  winner  of  the  tree  T.  This  is  added 
with  l[i]  to  get  F{i)  as  in  Equation  3.12.  Once  we  find  F{i),  we  then  insert  a 
Hlist  record  with  Hlist.hvalue  =  h[i]  and  the  F{.)  value  is  inserted  in  the  array  of 
Flist  records.  The  winner  tree  T  is  then  adjusted.  In  the  first  while  loop  of  the 
InsertValue,  conditions  of  Lemma  1  are  checked.  If  the  conditions  apply  then  the 
F(.)  values  are  deleted  and  the  winner  tree  adjusted.  Let  the  number  of  F{.)  value 
deletions  be  y.  In  the  second  vi^hile  loop  of  the  InsertValue,  it  is  checked  to  see 
whether  the  conditions  of  Observation  3  apply.  If  so,  the  F(.)  records  of  the  adjacent 
Hlist  record  is  added  to  the  current  Hlist  record  and  the  record  moved.  The  winner 


Procedure  fnsertValue(i); 

left  :=  left-  1;  tail  —  tail-  1; 
Flist[tail].q  :=  i; 

Flist[tail].F  :=  Winner  of  the  Min  Tree  T  +  l\i]; 

Hlist[left].hvalue:- h[i];  .;      ; -^    ;  • 

Hlist[left].top:=  tail;  Hlist[left\.hottom  :=  tail; 
Hlist[left\.gvalue  :=  IIlist[left].hvaluc+  Flist{Hlist[left].top\.F; 
AdjustWinnerTreefT ,  left); 

while  {head  ^  tail  and  Flist[tail].F  <  Flist[tail+  l].F)  do  ^ 
HUst[left  +  \].hottom  :=  Hlist[left  +  l].bottom  +  1; 
if  {Hlist[left  +  \].bottom  >  Hlist[left  +  l].top)  then 
Hlist[left  +  1]  :=  Hlist[left];  {  Move  the  record  } 
Hlist[left].gvalue  :=  oo; 
AdjustWinnerTreefT ,  left); 
left  :=  left  +  \; 
end;{of  if} 

Flist[tail+  1]  =  Flist[tail];  {  Move  the  record  } 
tail  :=  tail  +1; 
Hlist[left].top  :=  tail;  IIlist[left].bottom  :=  tail; 

end;{of  while} 

while  {left  +  right  and  Hlist[left].hvalue  >  Hlist[left  +  l].hvalue)  do 

{Conditions  of  Observation  3  apply} 

Hlist[left].top:=  IIlist[lefi+  \].top; 

Hlist[left  +  1]  :=  Hlist{left];  {  Move  the  record  } 

Hlist[left].gvalue  :—  oo; 

AdjustWinnerTreefT ,  left); 

left  :=  left -\-\; 

Hlist[left].gvalue  :=  Hlist[left].hvalue  +  Flist[Hlist[left].top].F; 

AdjustWinnerTreefT ,  left); 
end;{of  while} 
end;  {  of  InsertValue  ] 

Figure  3.6.  Procedure  to  Insert  F{.)  values 
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tree  is  then  adjusted.  Every  time,  the  conditions  of  Observation  3  apply  in  the  while 
loop,  we  spend  O(logn)  time.  I.e.,  every  time  the  conditions  apply  we  merge  two 
adjacent  Hlist  records.  Let  the  number  of  merges  in  a  single  invocation  of  InsertValue 
be  2.  The  total  time  taken  by  a  single  invocation  of  InsertValue,  assuming  y  F{.) 
values  are  deleted  and  z  Hlist  merges  take  place  is  0((j/  +  z  +  l)logn)  time. 

Note  that  not  more  than  n  F{.)  values  can  be  deleted  in  total,  and  not  more 
than  n  Hlist  records  can  be  merged  in  total.  This  implies  that  the  total  time  taken 
by  the  procedure  MinimizeHtCusiom  is  0(?7  log  n).  In  contrast,  the  algorithm  of  [21], 
for  the  same  problem  takes  O(n^)  time. 
3.4.2     Height  Constrained  Folding  (Problem  9) 

To  obtain  the  minimum  height  folding,  given  the  width  of  the  folding  W,  we  use 
parametric  search  in  conjunction  with  the  procedure  MinimizeHtCustom  developed 
in  Section  3.4.1.  The  procedure  MinimizeHtCustom  is  used  for  the  feasibility  testing. 
In  feasibility  testing,  we  are  given  the  width,x,  of  the  layout  and  we  test  whether  it 
is  possible  to  obtain  a  folding  such  that  the  height  of  the  folding  is  <  H.  The  set  of 
candidate  values  is  the  same  as  the  ones  described  in  Section  3.2.2.  The  feasibility 
testing  takes  O(nlogr?,)  time,  and  from  Corollary  2,  the  total  time  taken  to  obtain 
the  minimum  height  folding  is  0(n  +  n  log  n  + log  n)  =  O(nlog^n).  The  same  problem 
is  solved  in  O(n^logn)  time  in  [21]. 
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Table  3.1.  Heights  produced  by  width-constrained  standard  cell  folding  algorithms 


n 

Greedy 

Ours 

100 

624.4 

609.8 

400 

1979 

1961.2 

1000 

3813.7 

3721.2 

■a 


3.5     Experimental  Results 

The  procedure  MinimizeHtStandnrd  (Figure  3.2)  was  programmed  in  C  and 
run  on  a  SUN  4  workstation.  The  solution  produced  by  MinimizeHtStandard  was 
compared  with  the  one  obtained  using  the  greedy  heuristic  of  [26]. 

The  data  for  these  programs  were  produced  by  having  a  linearly  ordered  list  of 
modules  and  making  interconnections  between  the  modules  using  a  random  number 
generator.  The  connections  were  prioritized  so  that  there  is  a  large  number  of  con- 
nections between  modules  which  are  close  together.  Our  algorithm  always  produces 
better  solutions  than  the  greedy  heuristic  and  the  results  are  depicted  in  Table  3.1. 
The  results  shown  are  the  average  of  10  runs  for  each  n.  Our  algorithm,  on  the  aver- 
age, took  2  to  3  times  more  time  to  arrive  at  the  solution  than  taken  by  the  greedy 

heurisitic. 

The  algorithm  Minimize HtCxistom  was  programmed  and  the  run  times  compared 
with  the  algorithm  of  [21].  The  results  of  the  experiments  are  shown  in  Table  3.2. 
Both  the  programs  were  written  in  C.  It  is  evident  that  our  algorithm  is  considerably 
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Table  3.2.  Run  times  of  width-constrained  folding  algorithms  for  custom  cells 


n 

Ours 

[21] 

64 

2.56 

23.80 

250 

10.9 

350.3 

1000 

39.23 

6125.5 

Times  are  in  milliseconds 


superior  to  that  of  [21].  Since  both  algorithms  generate  optimal  solutions,  the  chip 
area  is  the  same  using  either. 

3.6     Conclusions 

We  have  developed  optimal  algorithms  to  fold  a  linearly  ordered  list  of  standard 
and  custom  cells.  Several  optimization  constraints  were  considered.  These  resulted 
in  a  total  of  nine  problem  formulations.  Two  of  these  correspond  to  problem  formu- 
lations for  the  bit-slice  stack  folding  problem  studied  in  [21].  The  algorithms  we  have 
developed  for  these  two  cases  are  asymptotically  superior  to  those  developed  in  [21]. 
Experimentation  with  one  of  these  shows  that  the  asymptotic  superiority  of  our  al- 
gorithms translates  into  a  much  reduced  execution  time.  For  the  other  formulations, 
heuristics  were  proposed  in  Shragowitz  et  al.  [25].  Our  algorithms  have  accept- 
able asymptotic  complexity  and  guarantee  optimal  solutions.  In  fact,  experiments 
conducted  with  one  yielded  foldings  with  smaller  chip  area  on  all  tested  instances. 
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CHAPTER  4 
PLANAR  TOPOLOGICAL  ROUTABILITY 


4.1      Introduction 

The  problem  of  routing  two-pin  nets  on  a  single  layer  has  been  studied  previously 
by  several  researchers.  The  river  routing  and  switch  box  routing  problems  are  special 
cases  of  this.  Efficient  algorithms  for  these  can  be  found  in  [12,  15,  17,  19,  20, 
22,  23,  27,  29,  30].  In  this  chapter,  we  are  concerned  with  the  problem  of  routing 
(topologically)  a  collection  of  two-pin  nets  in  a  single  layer  or  plane.  We  refer  to 
this  problem  as  the  TPR  problem.  The  input  to  the  problem  is  a  two  dimensional 
routing  surface  with  a  collection  of  modules  placed  in  it  (Figure  4.1(a)).  We  assume 
that  no  two  modules  touch.  There  are  pins  on  the  periphery  of  the  modules.  Pins 
with  the  same  number  define  a  net  and  are  to  be  joined  by  an  interconnect  or  wire. 
In  topological  routing,  we  are  concerned  with  defining  wire  paths.  However,  no 
underlying  grid  is  assumed  and  there  is  no  minimum  wire  separation  requirement. 
Thus  wire  paths  can  take  any  planar  shape  and  may  run  arbitrarily  close  to  each 
other.  Wires  are  not  permitted  to  intersect  or  run  over  modules.  In  Figure  4.1(a), 
the  broken  lines  indicate  wire  paths.  The  routing  instance  (RI)  of  Figure  4.1(a)  is 
topologically  routable  in  a  single  layer  while  that  of  Figure  4.1(b)  is  not.  The  TPR 
problem  for  RIs  in  which  all  modules  lie  on  the  boundary  of  the  routing  region  (or 
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(a)  Planar  Routable 


(b)  A  non  planar  routable  example 


Figure  4.1.  A  planar  routable  and  a  nonplanar  routable  case 


more  precisely  all  pins  are  on  the  boundary  of  the  region)  was  studied  in  [18,  12,  23]. 
A  simple  linear  time  algorithm  for  this  version  of  the  TPR  problem  was  developed 
in  these  papers.  For  the  case  in  which  none  of  the  modules  are  on  the  boundary, 
Pinter  [23]  has  suggested  using  the  linear  time  planarity  testing  algorithm  of  Hopcroft 
and  Tarjan  [9].  His  algorithm  is  quite  complex.  Marek-Sadowska  and  Tarng  [18]  have 
considered  the  TPR  problem  and  several  variants  which  include  flippable  modules 
and  multiterminal  nets.  They  develop  a  linear  time  algorithm  for  TPR  which  is 
based  on  module  merging.  In  this  chapter,  we  present,  in  Section  3.3,  another  linear 
time  algorithm  for  the  general  TPR  problem  that  is  almost  as  simple  as  the  one 
of  [18,  12,  23]  for  the  restricted  TPR  problem.  This  algorithm  was  developed  by 
Lim  [16]  but  the  proof  that  the  algorithm  is  correct  was  incomplete.  In  this  section, 
we  also  present  an  algorithm  for  definite  topological  routing.  That  is,  if  the  instance 
is  topologically  routable  we  give  an  algorithm  to  determine  the  loose  route  of  the 
wires.  The  TPR  algorithm  is  implemented  differently  than  described  in  the  Section 
3.3.    The  implementation  issues  are  discussed  in  Section  3.5.    Experimental  results 
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presented  in  Section  3.6  indicate  that  our  algoritlim  is  considerably  faster  than  the 
TPR  algorithm  of  Marek-Sadowska  and  Tarng  [18]  particularly  if  the  routing  instance 
is  not  planar  routable.  For  the  case  of  inultipin  nets,  we  show,  in  Section  3.4,  that 
testing  for  topological  routability  is  equivalent  to  graph  planarity  testing  and  that 
finding  the  maximum  number  of  nets  that  is  topologically  routable  is  NP-complete. 
We  also  extend  our  two-pin  algorithm  to  handle  multipin  instances  in  which  the 
modules  remain  connected  following  the  deletion  of  all  nets  with  more  than  two  pins. 

4.2     Preliminaries 

To  simplify  matters,  we  shall  assume  that  TPR  RIs  that  have  modules  on  the 
boundary  (Figure  4.2(a))  have  been  augmented  by  a  set  of  nets  that  are  required  to 
be  routed  on  the  boundary  and  that  this  routing  together  with  the  module  bound- 
aries enclose  the  routing  region  (Figure  4.2(b)).  This  augmentation  may  require  the 
addition  of  corner  modules  {A,  D,  C  of  Figure  4.2(b)).  This  assumption  is  needed  so 
that  our  algorithm  can  account  for  the  constraint  that  one  cannot  route  around  a 
boundary  module  but  can  route  around  all  other  modules. 

A  pin  segment,  P  =  P1P2  ■Pfci  is  a  sequence  of  pins  on  the  boundary  of  a 
module,  p^  . .  .pk  appear  in  this  order  when  the  module  is  traversed  counter-clockwise 
beginning  at  pi .  Some  of  the  pin  segments  of  the  modules  of  Figure  4.3  are:  abcde  and 
^j/b//of  module  1;  MLK  and  LA' JG/ of  module  3;  and  AiF  oimodu\e2.  Let  last{P) 
and  first{P),  respectively,  denote  the  last  and  first  pins  of  segment  P.  Let  net{pi) 
denote  the  net  associated  with  pin  pi.  Note  that  two  pins  p,  and  pj  are  to  be  connected 
by  a  wire  iff  ne/(p,)  =  net{pj).  A  curve,  C  =  P1P2  ...Pj,  is  a  sequence  of  pin  segments 


1  ■ 


Di 


1 

2 

3 

4 

a        a                  g  g 

I f f '' TTTI 

6(>— I         I 1  ' — <*f 


oy 


at- 


B 


•/ 


d  a  e  e 


(a)  RI  (b)  Augmented  RI 

Figure  4.2.  Augmentation 


Figure  4.3.  An  example  to  illustrate  some  terminology 
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such  that  nei{last[P,))  =  net{first{P,+i)),  I  <  i  <  J.  A  curve,  C  =  PiPj-.-^j, 
is  a  closed  curve  iff  net{last{Pj))  =  nei{first{Px)).  In  Figure  4.3,  net{pi)  is  the 
lowercase  letter  corresponding  to  p,.  So,  net{h)  =  nei(i/)  =  h.  Some  of  the  curves 
of  Figure  4.3  are  Ih  Habcdeg  Gf  FEDCBAi,  j  JGfM  mlh,  edcba  ABODE  and 
ABC  cdeg  GfM.  IhHabcdeg  Gf  FEDCBAi  and  edcba  ABCDE  are  closed  curves. 
With  any  curve  C  -  P1P2  ■  ■  ■  Pj,'^e  associate  j-\  {j  in  case  C  is  closed)  wires.  These, 
respectively,  connect  the  pins  last{Pi)  and  first{Pi+i),  I  <i  <  j  (and  last{Pj)  and 
first{Pi)  in  case  of  a  closed  curve).  Note  that  the  curves,  closed  curves,  and  wires 
associated  with  any  RI  depend  only  on  the  modules  and  the  net  to  pin  assignments. 
These  are  not  a  function  of  the  layout  of  any  of  the  wires. 

For  any  closed  curve  C  =  Pi  P2  •  •  •  -Pj  we  define  the  following: 

module{Pi)  . . .     module  corresponding  to  pin  segment  P,- 

pins{module{Pi))     ...     set  of  all  pins  on  module  module{Pi) 
pins{Pi)  ...     set  of  all  pins  on  segment  P.- 

pins{C)  ...     set  of  all  pins  on  curve  C  =  Ui^i  pins{Pi) 

ext.pins{C)  ...     [Jl^^  pins{module{P,))  -  pins{C) 

Note,  it  is  possible  that  module{Pi)  =  module{Pj),  for  i  /  j. 

Lemma  2  :  [16]  Let  I  be  an  RI  that  contains  a  closed  curve  C  with  respect  to  which 
there  are  two  pins  a  G  pins{C)  and  b  G  ext.pins{C)  such  that  net{a)  =  net{b).  I  is 
not  planar  routable. 
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(a)  Original  Situa!loii___fb)  Connect  terminalsa  and  b 


(c)  Re-routing  of  some  net 


Figure  4.4.  Two  possibilities  to  connect  a  and  6 


Proof  :  Figure  4.4  shows  two  possibilities.  It  should  be  clear  that  no  matter  how 
the  wires  of  C  and  the  wire  (a,  h)  are  laid  out,  there  must  be  an  intersection  between 
two  of  these.     O 

Lemma  3  :  [16]  Let  I  be  an  RI  that  contains  a  closed  curve  C  -  P\,P2,- ■  ■  .Pj 
and  another  curve  R  =  RxRi-.-Rk  such  that  modxde{Ri)  =  module{Pd)  for  some  d, 
I  <d<j  andfirst{Ri)  G  ext.pins{C)  (see  Figure  4.5).  Assume  that  there  exist  two 
pins  a  and  b  such  that  a  G  pins{C),  b  G  ext.pins{C)\Jpins{R),  and  net{a)  =  net{b). 
I  is  not  planar  routable. 

Proof    :  Follows  from  Lemma  2.     □ 

Two  modules  are  connected  iff  there  is  a  curve  C  =  P\P2  ■  ■  •  Pj  such  that  both 
modules  are  in  Uf=i  ^o(i?//e(P,).    A  connected  component  (or  simply  component)  is 
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Figure  4.5.  Another  not  planar  routable  situation 
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mal  set  of  modules  that  are  pairwise  connected.  It  is  easy  to  see  that  the 
connected  components  of  an  RI  are  disjoint.  A  boundary  component  is  a  connected 
component  that  includes  at  least  one  boundary  module.  Note  that  an  RI  with  no 
boundary  modules  has  no  boundary  component  while  an  RI  with  at  least  one  bound- 
ary module  has  exactly  one  boundary  component  (this  is  because  RIs  with  boundary 
components  have  been  augmented  as  in  Figure  4.2(b)). 

Lemma  I  :    An  RI  is  topologically  routable  iff  its  components  are  (independently) 
topologically  routable. 

Proof  :  It  is  easy  to  see  that  if  the  RI  is  topologically  routable  then  each  of  its 
components  is  topologically  routable.  Assume  that  each  component  is  topologically 
routable.  Order  the  components  of  the  RI  so  that  the  boundary  component  is  first. 
The  remaining  components  are  in  arbitrary  order.  Let  the  components  in  this  order 
be  Ki,K2,.-  .,I<k-  If  k  =  1,  then  nothing  is  to  be  proved.  So,  assume  K  >  1.  We 
shall  show  how  to  construct  a  topological  routing  for  I<u  I<2  •  •  • ,  /C  from  a  topological 
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(b)  Spanning  tree 


(c)  Envelope 


Figure  4.6.  Constructing  the  envelope  of  a  component 


routing  for  /Ti, . . . ,  /C-i  and  /<„,  2  <  a  <  k.  First  since  a  >  1,  /<„  is  not  a  boundary 
component.  So,  it  is  possible  to  surround  it  by  a  closed  non  self  intersecting  line 
such  that  the  region  enclosed  by  this  line  includes  exactly  those  modules  that  are 
in  Ka  and  no  module  touches  the  line.  The  region  enclosed  by  this  closed  line  has 
the  property  that  any  two  points  in  the  enclosed  region  can  be  joined  by  a  line  (not 
necessarily  straight)  that  lies  wholly  within  the  region.  We  refer  to  the  surrounding 
line  as  the  envelope  of  Ka-  One  way  to  obtain  an  envelope  of  Ka  is  to  first  construct 
a  set  of  \Ka\  -  1  {\Ka\  is  the  number  of  modules  in  A'a)  lines  (not  necessarily  straight) 
so  that  modules  of  Ka  together  with  these  lines  form  a  connected  component  in  the 
graph  theoretic  sense  (see  Figure  4.6).  These  lines  do  not  touch  or  cross  any  of  the 
modules  of  RI.  This  construction  can  be  done  as  every  pair  of  modules  of  an  RI  can 
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(a)  Intersections 


(b)  Re-routing 


Figure  4.7.  Re-routing  to  free  independent  component 


be  can  be  connected  by  such  a  line.  The  lines  and  modules  define  a  spanning  tree  for 
Ka-  By  fattening  the  lines  as  in  Figure  4.6(c),  the  envelope  is  obtained.  It  is  easy 
to  see  that  if  Ka  is  topologically  routable,  then  it  is  topologically  routable  with  the 
defined  envelope.  So,  use  such  a  topological  routing  for  Ka-  When  this  routing  is 
embedded  into  the  routing  for  Ki,...,  Ka-\  some  of  the  topologically  routed  wires 
of  /•i'l, . . . ,  A'o-i  may  intersect  (or  touch)  the  envelope  of  Ka-  However,  none  of  these 
wires  originate  or  terminate  in  the  envelope  of  Ka-  So,  these  can  be  rerouted  following 
the  contour  of  the  envelope  (Figure  4.7).     □ 

As  a  result  of  Lemma  4,  we  need  concern  ourselves  only  with  the  case  when  the 


RI  has  a  single  component. 


4.3     The  Algorithm 


Our  algorithm  to  obtain  a  topological  routing  of  a  component  uses  Lemmas  2 
and  3  to  detect  infeasibility.  The  algorithm  is  given  in  Figure  4.8.  As  stated,  it 
only  produces  an  ordering  of  the  wires  such  that  when  the  wires  are  topologically 
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Algorithm  Testing. Planar. Routability 

Step  1:     Let  m  be  any  module  of  the  component  and  let  p  be  any  pin  of  m. 

Step  2:  Examine  the  pins  of  m  in  counterclockwise  order  beginning  at  pin  p.  When 
a  pin  q  is  being  examined  compare  net{q)  and  net{r)  where  r  is  the  pin  (if  any) 
at  the  top  of  stack  A.  If  stack  A  is  empty  or  net{q)  7^  net{r)  then  add  q  and 
the  remaining  pins  of  m  to  the  top  of  stack  A.  Otherwise  output  {q,r)  and 
unstack  r  from  A. 

Step  3:     If  both  stack  A  and  B  are  empty,  then  terminate. 

Step  4:  Let  r  be  the  pin  at  the  top  of  stack  A.  Let  s  be  the  pin  such  that 
net{r)  =  net{s). 

(a)  If  s  is  at  the  top  of  the  stack  5,  then  [output  (r,  s);  unstack  r  from  A  and 
5  from  B;  go  to  start  of  Step  3]. 

(b)  If  s  is  in  stack  B  but  not  at  the  top,  then  [output("The  RI  is  not  planar 
routable").  Terminate]. 

(c)  If  s  is  in  stack  A,  then  [unstack  r  from  A;  add  r  to  stack  B;  go  to  the 
start  of  Step  4]. 

(d)  If  s  is  in  neither  of  the  stacks  then  [  set  p  to  5;  let  m  be  the  module 
containing  s;  go  to  Step  2]. 

Figure  4.8.  Topological  routing, 
routed,  one  at  a  time,  in  this  order,  then  there  is  always  a  path  between  the  two 
end  points  of  the  wire  currently  being  routed  such  that  this  path  does  not  intersect 
previously  routed  wires  or  cross  any  of  the  modules.  This  is  sufficient  to  obtain  the 
actual  topological  routing. 

Our  algorithm  employs  two  stacks  A  and  B.  Stack  A  maintains  a  pin  sequence 
that  defines  a  curve  of  the  RT.  Stack  B  is  used  to  retain  pins  that  define  closed  curves 
with  respect  to  a  (sub)  curve  on  stack  A.  We  describe  the  working  of  the  algorithm 


DO 


(a)Example  RI 


(b)A  possible  topological  routing 


Figure  4.9.  Example  RI 


with  the  aid  of  an  example  (Figure  4.9(a)).  There  are  four  modules  1-4  and  16  pins 
a-  h  and  A-  H.  net{p)  =  p  if  p  is  a  lowercase  letter  and  net{p)  =  lowercase{p)  if 
p  is  an  uppercase  letter.  Suppose  we  begin  in  step  1  with  m  =  3  and  p  =  B.  Then 
in  step  2,  BAFEC  get  stacked,  in  that  order  on  to  stack  A.  This  corresponds  to 
the  curve  of  Figure  4.10(a).  Pin  c  is  in  neither  of  the  stacks  and  in  step  4(d),  we  set 
m  -  \,  p  -  c,  and  go  to  step  2.  In  this  step,  the  wires  Cc,  Ee  and  Ff  are  output 
for  routing.  The  pins  g  and  D  are  put  on  the  top  of  stack  A.  The  curve  traced  so 
far  is  shown  in  Figure  4.10(b).  The  routed  wires  are  also  shown  as  a  curve.  Note 
that  these  wires  have  to  be  routed  using  the  procedure  find-route,  otherwise  they 
can  enclose  a  non-empty  region.  The  curve  is  extended  to  module  2  and  stack  A  has 
configuration  BAghb.  The  wire  Dd  is  output  for  routing.  The  curve  has  the  form  as 
shown  is  Figure  4.10(c).  The  curve  cannot  be  extended  further  as  both  end  points 
of  wire  Bh  are  on  the  stack.  This  means  that  we  have  detected  a  closed  curve  of  the 
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B-A-F-E-C- C-e-f-g-D-d-h-b 


Figure  4.10.  Illustration  of  the  routing  sequence 


RI.  The  detected  curve  is  that  of  Figure  4.10(c).  We  defer  the  routing  of  Bb  until  we 
have  verified  emma  2  and  3  for  this  closed  curve.  The  deferment  also  ensures  that 
the  current  topological  routing  does  not  contain  a  closed  line.  If  Bb  were  routed  now, 
then  the  wires  Bb,  Cc,  and  Dd  together  with  the  boundaries  of  modules  1,  2,  and 
3  would  define  a  closed  line  that  encloses  a  non-empty  region.  This  could  result  in 
future  routing  problems  as  there  would  be  no  path  between  a  point  in  the  region  and 
one  that  is  outside  the  region.  For  example,  if  the  routing  of  Figure  4.11  is  used,  then 
there  is  no  path  between  a  and  /I  as  a  is  in  the  enclosed  (shaded)  region  while  A  is 
outside  of  it.  The  routing  of  Bb  is  deferred  by  saving  b  on  stack  B.  The  curve  of  stack 
A  is  extended  to  module  4  via  the  wire  hH.  Wire  hH  is  output  for  routing.  Also 
the  wires  gG  and  Aa  are  output  for  routing.  Stack  A  contains  the  pin  B  and  stack 
B  contains  the  pin  b.  The  curve  is  shown  in  Figure  4.10(d).  Finally,  the  wire  Bb  is 
output  for  routing  and  the  since  both  the  stacks  are  empty,  the  algorithm  terminates 
successfully  in  step  3. 
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Figure  4.11.  Trapped  terminal  and  module 


The  routing  order  is  Cc,  Ee,  Ff,  Dd,  hH,  gG,  Aa  and  Bb.  Let  us  try  this 
out  on  our  example.  We  see  that  no  matter  how  Cc  is  routed  there  will  remain  a 
routing  path  for  the  remaining  wires.  The  routing  of  Dd  and  Hh  cannot  create  any 
enclosed  regions  and  so  cannot  affect  the  feasibility  of  future  routes.  When  Ee  and 
Ff  are  routed,  an  enclosed  region  can  be  formed.  Hence  these  wires  have  to  be 
routed  using  the  procedure  find.route  of  Figure  4.13,  otherwise  they  can  enclose  a 
non-empty  region.  The  topological  routed  RI  can  be  found  in  figure  Figure  4.9(b). 

Lemma  5  :    If  algorithm  Testing.Planar.Routability  terminates  in  step  3,  then  the 
input  instance  is  topologically  routable. 


Proof  :  We  shall  show  that  the  algorithm  Testing_Planar_Routability  maintains  the 
following  invariant: 

There  is  a  topological  routing  of  all  wires  output  so  far  such  that  each  remaining 
wire  is  (individually)  topologically  routable. 
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Figure  4.12.  To  illustrate  conflict 


This  is  true  when  we  start  as  at  this  time,  no  wires  have  been  output  and  for 
each  wire,  there  is  a  routing  path  between  its  two  pins.  Assume  that  the  invariant 
holds  just  before  some  wire  (r,  s)  is  output.  We  shall  show  the  invariant  holds  after 
this  wire  is  output  for  routing.  Wire  {r,s)  satisfies  exactly  one  of  the  following: 

(a)  It  is  output  in  step  2,  r  is  a  pin  that  was  on  stack  A,  s  is  the  first  pin  to  be 
reached  on  its  module. 

(b)  It  is  output  in  step  2,  r  was  on  stack  A,  s  is  not  the  first  pin  to  be  reached  on 
its  module. 

(c)  It  is  output  in  step  4(a).    At  the  time  of  output,  r  was  at  the  top  of  stack  A 
and  s  at  the  top  of  stack  B. 
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If  we  are  in  case  (a),  then  since  module{s)  is  reached  for  the  first  time,  no  matter 
how  wire  (r,s)  is  routed  at  this  time  no  new  enclosed  regions  are  formed.  Hence  all 
remaining  wires  remain  routable. 

The  proofs  for  cases  (b)  and  (c)  are  similar.  We  consider  case  (c)  only.  From  the 
algorithm  it  follows  that  at  some  time  prior  to  the  output  of  (r,s),  both  r  and  s  were 
on  stack  A,  s  was  at  the  top  of  A  and  about  to  be  moved  to  stack  B.  The  pins  on 
stack  A  beginning  with  r  and  ending  at  s  define  a  closed  curve  C  (as  net{r)  =  nei{s)). 
Let  these  pins  be  from  modules  module{r)  =  Mi,Mi,...,Mk  =  module{s)  (in  this 
order  moving  up  stack  A).  Let  p  be  any  pin  in  pins{C)  -  {r,  s}  and  let  q  be  such  that 
net{p)  =  net{q).  We  may  assume  that  either  q  G  pins{C)  or  module{q)  is  unvisited 
at  the  time  s  is  moved  from  stack  A  to  stack  B.  Note  that  if  this  is  not  the  case, 
then  q  is  in  stack  A  and  below  r  at  this  time.  From  the  working  of  the  algorithm  it 
follows  that  when  r  reaches  the  top  of  stack  A  (as  it  does  by  the  assumption  of  {r,s) 
being  output  from  step  4(a)),  p  must  be  on  stack  B  and  above  A.  So,  the  algorithm 
should  have  terminated  unsuccessfully  in  step  4(b),  contradicting  the  assumption  of 
termination  in  step  3. 

Let  U  be  the  set  of  unvisited  modules  at  the  time  s  is  transferred  from  stack  A 
to  stack  B.  By  extending  our  previous  argument,  we  see  that  the  set  A'^  of  modules 
visited  by  the  algorithm  between  the  time  s  is  transferred  from  stack  A  to  stack  B 
and  the  time  (r,  s)  is  output  is  such  that 

(1)  A^  C  lIiJmodules[C). 

(2)  All  pins  in  (A^  n  [/)  U  {pins{C)  -  {r,s])  have  been  output  for  routing. 


73 


Algorithm  find.route{r,  s) 
begin 

currentpin  :=  s;  I  :=  c\ockwisep'm{currentpin); 
while  (/  /  r)  do 
begin 
Step  1:  Route  clockwise  from  currentpin  to  /  following  the  module  boundary 
Step  2:  currentpin  =  q  such  that  net{q)  =  net{l) 
Step  3:  Continue  the  route  from  /  to  currentpin  following  the  existing  route 

closely 
Step  4:  /  —  clockwisepin(curren<pin) 
end 
Complete  the  route  from  currentpin  to  /  =  r  following  the  module  boundary 

end 

Figure  4.13.  Algorithm  to  find  routing  path  between  pins  r  and  s 

(3)    All  pins  reached  from  N  D  modules{C)  are  in  pins{C)  U  pins{N  n  U). 

We  now  claim  that  algorithm  find.route{r,s)  obtains  a  topological  routing  of 
the  wire  {r,s)  that  preserves  the  invariant.  To  establish  this,  we  need  to  show  that 

(A)  The  algorithm  actually  finds  the  route  between  r  and  s. 

(B)  The  region  enclosed  by  this  route  and  the  curve  C  contains  no  pins  that  have 
not  been  routed  to. 

To  prove  (A),  we  need  to  show: 

(Al)  For  each  value  of  currentpin,  clockwisepin(curren/pm)  (i.e.,  the  pin  clockwise 
from  currentpin)  is  defined  and  different  from  currentpin. 

(A2)    The  net  {I,  currentpin)  in  step  3  of  Figure  4.13  is  already  routed,  so  it  is 
possible  to  follow  this  route. 
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(A3)    currentpin  does  not  assume  the  same  value  twice. 

For  (Al),  we  simply  assume  that  each  module  has  atleast  two  pins.  Modules 
with  a  single  pin  may  be  ignored  initially  and  routed  to  after  the  remaining  routes 
have  been  made.  For  (A2),  let  the  value  of  the  currentpin  and  /  at  the  start  of 
the  i'th.  iteration  of  the  while  loop  be  c,  and  /,,  respectively.  We  note  that  ci  =  s 
and  h  e  pins{C).  If  li  €  pins{C)  U  pins{N  D  U),  then  from  conditions  (2)  and  (3), 
it  follows  that  c,+i  €  pins{C)  U  pins{N  fl  U)  and  wire  (/.•,c,+i)  has  been  routed. 
Suppose  there  is  an  /,  ^  pins{C)  U  pins{N  n  U).  Let  Ij  be  the  first  such  /,-.  Since 
;■  >  1,  Ij-k  and  Cj^k+i  are  in  pins{C)  U  pins{N  n  U)  for  k  >  1.  Since  Cj  and  Ij 
are  on  the  same  module,  it  follows  that  module{cj)  ^  A^.  So,  Ij  G  ext.pins{C)  and 
Cj  e  pins{C).  From  the  way  algorithm  Testing- Planar.Routability  works,  it  follows 
that  (/j_i,Cj)  is  a  segment  of  the  curve  C  and  that  curve  C  when  oriented  from  r 
to  s,  first  reaches  /j_i  and  then  Cj  via  wire  (/j_i,Cj).  Hence  /j_i  €  pins{C).  Since 
/j_2  e  pins{C)  (by  assumption  on  j),  Cj_]  G  pins{C)  U  pins{N  fl  (/).  Further,  since 
Cj_i  is  a  module  of  C,  Cj_i  €  pins{C).  Now,  since  (/^-i,  Cj)  is  a  segment  of  C  and  Cj_i 
is  one  pin  clockwise  from  /j_i  and  a  pin  of  C,  it  follows  that  (/j_2,Cj_i)  is  a  segment 
of  C  oriented  from  lj-2  to  Cj_i.  Continuing  in  this  way,  we  conclude  that  {h,C2)  is 
a  segment  of  C  oriented  from  /i  to  C2.  However,  we  know  that  when  C  is  oriented 
from  r  to  5  there  is  only  one  wire  segment  that  includes  a  pin  of  module{s)  and  this 
is  oriented  to  module{s).  That  is,  the  orientation  is  C2  to  h,  a  contradiction.  Hence, 
there  is  no  /,•  ^  pins{C)  U  pins{N  n  U).  Also,  no  /.•  G  {r,s}  at  the  start  of  a  while 
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loop  iteration.  From  condition  (2),  it  follows  that  all  encountered  {li,Ci+i)  have  been 

routed. 

For  (A3),  suppose  that  c,  =  Cj  for  sonne  i  and  j,  i  <  j.  Since  (/,-i,c,)  and 
{Ij-uCj)  are  two-pin  nets,  it  follows  that  /i_i  =  Ij-i.  Now,  since  c,_i  and  Cj_i 
are,  respectively,  one  pin  counterclockwise  from  /,_i  and  /j_i,  it  follows  that  c,_i  = 
Cj-1.  Continuing  in  this  way,  we  see  that  s  =  Cj  =  Cj_,+i.  This  implies  that  net 
(/j_i,Cj_,+i)  =  (r,s)  has  already  been  routed.  But,  it  has  not.  So,  no  c,  is  repeated. 

(B)  follows  from  the  fact  that  find.route  reaches  only  pins  in  pins{C)Upins{Nn 
U),  condition  (3)  and  the  fact  that  find.route  follows  existing  routes  without  enclosing 
any  new  pins.     □ 

Lemma  6  :  If  the  algorithm  Testing. Planar.Routability  (given  in  Figure  4.8)  termi- 
nates in  step  4(b),  the  RI  is  not  planar  routahle. 

Proof  :  If  the  algorithm  terminates  in  step  4(b),  then  let  r  and  s  be  as  in  step  4.  r 
is  at  the  top  of  stack  A  and  5  is  in  stack  B  but  not  at  the  top.  Let  x  be  at  the  top  of 
stack  B  and  let  y  be  the  pin  such  that  net{y)  =  net{x).  y  must  currently  be  on  stack 
A  as  X  can  be  put  on  stack  B  (see  step  4(c))  only  if  y  is  on  stack  A.  When  one  pin 
of  a  net  is  in  stack  A  and  the  other  in  stack  B,  the  pins  can  leave  the  stacks  together 
(step  4(a))  or  not  at  all.  Since  .t  is  on  stack  B  at  termination,  y  must  still  be  on 
stack  A  and  hence  must  be  lower  than  r  (as  r  is  at  the  top).  So,  there  is  a  curve 
1/ ...  r  in  the  RI.  Furthermore,  curves  y  ...r  ...s  and  y...r...x  must  exist  as  this  is 
the  only  way  s  and  x  can  get  to  stack  A  and  then  to  stack  B.  Figure  4.12(a)  shows 
an  example  curve  y  ...r  ...s.  This  figure  assumes  that  module{s)  /  module{r).  The 
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''  proof  for  the  case  module{s)  =  module{r)  is  similar.  Let  m  be  the  module  at  which 

the  curves  y  ...r  ...s  and  y  ...r  ...x  diverge  (Figure  4.12(b)).  Note  that  m  may  be 
module{r)  or  a  latermodule  on  the  curve  y  . .  .r  ...s.  Let  u  be  the  pin  of  m  that  is  the 
last  pin  of  m  on  curve  y  . .  .r  . .  .s  and  let  v  be  the  corresponding  pin  for  y  ...  r ...  x. 
Since  all  nets  are  two-pin  nets,  u  ^  v.  Since  x  is  above  s  in  stack  B,  v  must  be 
on  the  curve  y  ...r  ...s.  The  curve  C  =  y  . .  .r  . .  .v . .  .x  \s  &  closed  curve.  We  see 
that  r  e  pins{C),  and  s  G  ext.pins{C),  and  net{s)  =  net{r).  So,  5  and  r  satisfy  the 
conditions  of  Lemma  3  and  the  RI  is  not  planar  routable.     □ 

Theorem  6  :  The  algorithm  Testing. Planar.Routability  (given  in  Figure  4.8)  is  cor- 
rect. 

Proof   :  Follows  from  Lemmas  5  and  6.     Q 

The  algorithm  of  Figure  4.8  is  easily  implemented  to  have  complexity  of  0{n) 
where  n  is  the  total  number  of  pins.  For  this  we  need  to  use  an  array  status[l..n]  to 
maintain  the  current  status  (i.e.,  on  stack  A,  on  stack  B,  on  neither)  of  each  pin. 

4.4     Topological  Routabilitv  of  Multi-pin  Nets 

We  shall  refer  to  the  extension  of  Testing.Planar.Routahility  or  TPR  to  the  case 
where  some  or  all  nets  may  have  more  than  two  pins  as  MTPR.  The  MTPR  prob- 
lem may  be  solved  in  linear  time  by  mapping  MTPR  instances  into  graph  planarity 
instances  [15,  18].  However,  the  known  linear  time  algorithms  [9]  for  graph  planarity 
are  complex  and  one  is  motivated  to  explore  the  possibility  that  simpler  algorithms 
exist  for  MTPR  (just  as  they  do  for  TPR).  Unfortunately,  this  is  not  the  case.  We 
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show,  in  Theorem  7,  that  any  algorithm  for  MTPR  can  be  used  to  test  graph  pla- 
narity  with  no  increase  in  complexity.  In  Theorem  8,  we  show  that  the  problem  of 
determining  the  maximum  number  of  topologically  routable  nets  of  an  MTPR  in- 
stance is  NP-hard.  For  the  case  where  all  the  pins  are  two-pin  nets,  we  can  use  the 
construction  of  [18]  and  the  algorithm  of  [28]  to  find  the  maximum  subset  that  is 
topologically  routable.  The  complexity  of  the  resulting  algorithm  is  O(n^),  where  n 
is  the  total  number  of  nets.  Theorem  7  motivates  the  quest  for  a  simple  linear  time 
algorithm  for  a  restricted  version  of  MTPR.  We  show  that  the  algorithm  of  Figure  4.8 
may  be  extended  to  handle  MTPR  instances  in  which  every  pair  of  modules  remains 
connected  (though  not  neccessarily  by  a  net)  when  all  nets  other  than  two-pin  nets 
are  eliminated. 

Theorem  7  :  Let  I  be  an  instance  of  graph  planarity.  I  can  he  transformed  in  linear 
time,  into  an  instance  /'  of  MTPR  such  that  I  is  planar  iff  I'  is  topologically  routable. 

Proof  :  From  the  constructions  of  [15,  18],  it  follows  that  the  topological  routability 
of  an  MTPR  instance  does  not  depend  on  the  specific  placement  of  the  modules. 
Hence,  in  constructing  /',  we  need  not  specify  the  module  placement.  /'  is  obtained 
from  /  by  replacing  each  edge  {ij),  ?  <  ;  of  /  by  a  module  M,j  with  two  pins  M/^ 
and  Mfj.  The  nets  of  /'  are  Ni  =  {M,)  \i<j]U  {Mf,  |  ;'  <  0,  1  <  i  <  «  where  n  is 
the  number  of  vertices  in  /. 

If  /'  is  planar  routable,  then  each  net,  TV,-,  has  a  planar  realization  that  does 
not  contact  the  realization  of  any  other  net.  This  realization  connects  the  pins  of  A^,- 
together,  possibly  using  some  Steiner  points  (see  Figure  4.14(a)). 
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Figure  4.14.  Realization  of  a  planar  net  using  one  Steiner  point 


If  Ni  is  a  two-pin  net,  then  introduce  vertex  u,  anywhere  on  the  wire  connecting 
the  two  pins  of  iV..  If  Ni  has  more  than  two  pins,  then  by  using  the  transformations 
of  Figure  4.14(b),  we  can  reduce  the  number  of  Steiner  points  to  one  and  also  ensure 
that  each  pin  of  N,  has  exactly  one  wire  connected  to  it.  There  transformations 
preserve  the  planarity  of  the  routing.  The  sole  surviving  Steiner  point  is  replaced  by 
the  vertex  u,-. 

Now  each  wire  that  connects  to  u,  connects  to  module  Mij  or  Mj,.  This  module 
has  another  wire  connecting  to  vertex  ;.  Remove  A/,j  (or  Af,,)  and  join  the  ends  of 
these  two  wires  together  by  a  line  joining  the  terminals  of  Mij  (or  Mj,).  We  now  have 
a  planar  embedding  of  /. 

If  /  is  planar,  then  start  with  its  planar  embedding.  Replace  u,-  by  a  Steiner 
point;  place  M,j  anywhere  on  the  embedding  of  edge  {i,j),  i  <  j;  split  the  edge  {i,j) 
at  Mij  and  connecting  the  two  ends  (at  the  split  point)  to  the  terminals  of  Mij.  This 
yields  a  topological  routing  of  /'. 
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Hence,  /  is  planar  iff  I'  is  topologically  routable.     □ 

Let  MTPRmax  be  the  problem  of  determining  whether  or  not  k  of  the  nets  of  an 
MTPR  instance  are  topologically  routable.  To  show  that  MTPRmax  is  NP-complete, 
we  use  the  following  problem  that  is  known  to  be  NP-complete  [7]. 

Planar  Subgraph:  Given  Graph  G  =  {V,E),  and  a  positive  integer  k  <\V  \. 
Is  there  a  subset  V  C  V  with  \  V  \>  k  such  that  the  subgraph  induced  by  the  V 
vertices  is  planar? 

Theorem  8  :  MTPRmax  is  NP-complete. 

Proof  :  It  is  easy  to  see  that  MTPRmax  is  in  NP.  Also,  from  an  instance  /  of  the 
planar  subgraph  problem,  we  can  construct  an  instance  /'  of  MTPRmax  by  replacing 
edges  by  modules  as  in  Theorem  7.  It  is  easy  to  see  that  /'  has  k  nets  that  are 
topologically  routable  iff  I  has  an  induced  subgraph  with  k  vertices  that  is  planar. 
D 

Any  instance  /  of  MTPR  may  be  transformed  into  an  instance  /'  of  TPR  which 
includes  unordered  modules  (i.e.,  modules  whose  terminals  may  be  rearranged  at 
will).  /'  has  the  property  that  there  is  an  arrangement  of  terminals  for  each  of  the 
unordered  modules  which  results  in  /'  being  topologically  routable  iff  I  is  topolog- 
ically routable.  To  obtain  /'  from  /,  for  each  multipin  net  A^,  of  size  fc,  fc  >  2,  we 
introduce  an  unordered  module  UMi  with  k  pins.  The  net  A'^,-  is  replaced  by  k  two- 
pin  nets,  one  pin  of  each  of  these  nets  is  an  original  pin  of  A^,-  and  the  other  a  pin 
of  UMi.  Since  planar  routability  is  not  affected  by  module  placement,  UMi  may  be 
placed  anywhere  in  the  routing  region.  An  example  is  given  in  Figure  4.15. 
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Figure  4.15.  Transformation  from  multipin  to  two-pin  nets 


Theorem  9  :  The  pins  of  each  unordered  module  of  V  can  be  ordered  so  that  the 
resulting  instance  of  TPR  is  topologically  routable  iff  I  is  topologically  routable. 

Proof  :  If  the  pins  of  the  f/M,'s  can  be  so  ordered,  then  a  topological  routing  of 
/  is  easily  obtained  from  the  topological  routing  of  7'  (simply  replace  each  UMi  by 
a  Steiner  point).  If  I  is  topologically  routable,  then  using  transformations  similar  to 
those  in  Figure  4.15(b),  we  may  transform  the  topological  routing  into  one  in  which 
each  multipin  net  iV,  of  size  A;  >  2  is  routed  using  exactly  one  Steiner  point.  This 
Steiner  point  is  replaced  by  module  UMi  and  the  pin  ordering  is  determined  by  the 
topological  routing  around  the  Steiner  point.     □ 

If  we  knew  which  terminal  orderings  of  the  t/M.'s  to  use,  we  could  simply  convert 
each  UMi  to  an  ordered  module  and  run  the  TPR  algorithm.  Unfortunately,  we  do 
not  know  this.  Therefore  we  need  to  modify  algorithm  Testing.Planar.Routability  so 
as  to  properly  handle  unordered  modules.    As  in  section  3,  we  may  assume  that  I 
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is  a  single  component.  For  our  modification  to  work,  we  assume  that  /  remains  a 
single  component  when  all  multipin  nets  Ni  of  size  A;  >  2  are  eliminated  from  /.  The 
modified  algorithm  MTPR  is  given  in  Figure  4.16. 

The  working  of  the  MTPR  algorithm  is  explained  using  the  example  in  Fig- 
ure 4.17.  There  are  five  modules  1-5  and  there  are  seven  nets  out  of  which  six  are 
two-pin  nets.  The  seventh  is  a  four-pin  net.  The  pins  of  the  four-pin  net  are  w,x,y 
and  z.  In  step  1  of  the  algorithm  of  Figure  4.16,  we  replace  the  multipin  net  of  size 
four  with  a  new  unordered  module  UMx  which  has  four  pins.  Suppose  we  begin  in 
step  2  with  m  =  5  and  p  =  .4.  Then  in  step  3,  ACxB  get  stacked,  in  that  order,  on 
to  stack  A.  This  corresponds  to  the  curve  of  Figure  4.18(a).  The  top  of  stack  A  now 
has  pin  fi,  and  the  curve  on  stack  A  is  extended  by  adding  pins  from  module  1  to 
stack  A.  In  this  process,  the  wire  Bb  is  output  for  routing.  This  situation  is  depicted 
in  Figure  4.18(b).  Since  pin  a  is  at  the  top  of  stack  A  and  its  mate  is  below  it  in  the 
stack,  pin  a  is  moved  from  stack  A  to  stack  B.  At  the  top  of  stack  A,  we  have  a  pin 
of  a  four-pin  net  and  since  this  net  is  seen  for  the  first  time,  we  add  the  unordered 
module  to  the  top  of  stack  B  and  mark  y  as  having  been  seen.  We  route  the  wire 
from  pin  y  to  the  unordered  module.  The  pin  x,  which  is  below  pin  y  is  next  routed 
to  the  unordered  module.  This  is  done  using  the  procedure  find.route  of  Figure  4.13. 
The  curve  on  stack  A  is  extended  by  adding  pins  from  module  4.  At  this  time,  wire 
Cc  is  output  for  routing.  This  scenario  is  depicted  in  Figure  4.18(c).  Now,  pin  z  is 
at  the  top  of  stack  A.  This  pin  is  routed  to  the  unordered  module  in  step  6(a)  of  the 
algorithm.  Next,  in  step  3  the  wire  dD  is  output  for  routing.  Also,  pin  E  is  put  on 
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Algorithm  MTPR 

Step  1:  For  each  multipin  net  A^,  of  size  k  >  2,  introduce  a  new  unordered  module 
UMi  with  k  pins  and  replace  Ni  by  k  two  terminal  nets  as  described  earlier. 

Step  2:  Let  m  be  any  ordered  module  and  let  p  be  any  pin  of  m  which  corresponds 
to  an  original  two-pin  net. 

Step  3:  Examine  the  pins  of  m  in  counterclockwise  order  beginning  at  pin  p.  When 
a  pin  q  is  being  examined  compare  net{q)  and  net{r)  where  r  is  the  pin  (if  any) 
at  the  top  of  stack  A.  If  stack  A  is  empty  or  net{q)  ^  net{r)  then  add  q  and 
the  remaining  pins  of  m  to  the  top  of  stack  A.  Otherwise  output  {q,r)  and 
unstack  r  from  A. 

Step  4:     If  both  stacks  A  and  B  are  empty,  then  terminate. 

Step  5:  Let  r  be  the  pin  at  the  top  of  stack  A.  Let  s  be  the  pin  such  that 
net{r)  =  net{s).  If  module{s)  is  an  unordered  module  then  go  to  step  6. 

(a)  If  s  is  at  the  top  of  the  stack  B,  then  [output  (r,s);  unstack  r  from  A  and 
s  from  B;  go  to  start  of  Step  4]. 

(b)  If  s  is  in  stack  B  but  not  at  the  top,  then  [output("The  RI  is  not  planar 
routable").  Terminate]. 

(c)  If  5  is  in  stack  A,  then  [unstack  r  from  A;  add  r  to  stack  B\  go  to  the 
start  of  Step  5]. 

(d)  If  s  is  in  neither  of  the  stacks  then  [  set  p  to  s\  let  m  be  the  module 
containing  s\  go  to  Step  3]. 


Step  6: 


(a)  If  module{s)  is  at  the  top  of  stack  B,  then  [output  (r,s);  unstack  r  from 
A\  mark  pin  s  as  having  been  seen.  If  all  pins  of  module{s)  have  been 
marked  then  unstack  module{s)  from  B;  go  to  start  of  step  4] 

(b)  If  module{s)  is  on  B  but  not  at  the  top,  then  [output("The  RI  is  not 
planar  routable").  Terminate]. 

(c)  If  module{s)  is  not  in  stack  /?,  then  [unstack  r  from  A;  mark  pin  s  as 
having  been  seen;  add  module{s)  to  the  top  of  stack  B\  go  to  start  of  step 
5]. 


r  •>'        "'  Figure  4.16.  Topological  routing  of  multipin  net  for  restricted  version  .-'-i^ 
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Figure  4.17.  Example  RI  with  a  four  pin  net 


the  top  of  stack  A.  At  this  point  stack  A  contains  pins  AfE,  bottom  to  top  in  that 
order.  This  situation  is  depicted  in  Figure  4.18(d).  We  set  m  =  2  and  p  =  e  in  step 
3  of  the  algorithm  and  output  wires  Ee  and  fF  for  routing.  The  remaining  pin  w  is 
put  at  the  top  of  stack  A.  In  step  6(a)  of  the  algorithm,  we  mark  pin  w  as  seen  and 
route  a  wire  from  this  pin  to  the  unordered  module.  Also,  we  remove  the  unordered 
module  from  the  top  of  stack  B.  This  is  shown  in  Figure  4.18(e).  Now,  stack  A 
contains  pin  A  and  stack  B  contains  pin  a  and  the  wire  Aa  is  output  for  routing  in 
step  5(a)  of  the  algorithm.  Both  the  stacks  are  empty  and  the  algorithm  terminates 
successfully  in  step  4.  The  topologically  routed  RI  can  be  found  in  Figure  4.17. 

Lemma  7  :  If  the  algorithm  MTPR  halts  in  step  4,  then  the  wires  are  planar  routable. 


Proof  :  The  proof  is  very  similar  to  that  of  Lemma  5.  The  same  invariant  holds. 
When  we  put  an  unordered  module  on  stack  B  and  mark  a  pin  (if  that  is  the  first 
pin  marked)  then  we  connect  this  pin  to  the  unordered  module.  See  that  there  is  no 
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Figure  4.18.  Illustration  of  the  routing  sequence  with  Multiterminal  net 


enclosed  region  and  the  invariant  holds  true.  Now,  if  we  are  routing  another  pin  (of 
multipin  net)  then  the  pin  it  has  to  be  connected  is  on  the  unordered  module.  So  as 
soon  as  we  reach  the  unordered  module,  the  next  pin  is  chosen  as  the  pin  it  has  to 
be  connected  to  (this  also  defines  the  order  of  pins  in  the  unordered  module).  The 
proofs  apply  in  this  case  as  we  have  made  all  nets  two-pin  nets.     □ 

Lemma  8  :  Let  I  he  an  RI  that  contains  a  curve  C  =  P\P2  ■  ■  -Pj-  Let  R  =  R1R2  .  ■  ■  Rk 
and  S  =  S1S2  ...  5/  be  two  curves  such  that  module{Ri)  —  module{Pd)  for  some  d, 
I  <  d  <  j  and  first{Ri)  €  ext.pins{C)  and  module{Si)  =  module{Pe)  for  some  e, 
1  <  e  <  7  and  first{S})  G  pins{C). 
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Let  C  be  such  that  first{Pi)  and  last{Pj)  are  part  of  the  same  net  N.  Assume 
that  there  exist  two  pins  a  and  b  such  that  a  €  pins{C)  Upins{S),  b  €  extjpins{C)  U 
pins{R)  and  net{a)  =  net{b)  ^  N. 

I  is  not  planar  routable. 

Proof    :  Follows  from  Lemma  2.     □ 

Lemma  9  :  //  algorithm  MTPR  terminates  in  steps  5(b)  or  6(b),  the  RI  is  not  planar 
routable. 

Proof  :  Suppose  the  algorithm  terminates  in  step  5(b).  Let  r  and  s  be  as  in 
step  5  and  let  x  be  at  the  top  of  stack  B.  Note  that  r  and  s  define  a  two-pin  net. 
If  X  is  a  two-pin  net,  then  the  RI  is  not  planar  routable  (see  proof  of  Lemma  5). 
So  assume  that  x  is  an  unordered  module  (note  that  only  pins  of  two-pin  nets  and 
unordered  modules  get  on  to  stack  B).  Module  x  must  have  atleast  one  marked  and 
one  unmarked  pin.  Let  C  be  the  curve  defined  by  the  stack  A  segment  from  r  to  5 
when  s  was  at  the  top  of  stack  A  just  prior  to  being  transferred  to  stack  B.  From  the 
working  of  MTPR,  it  follows  that  there  is  a  pin  p  G  pins{C)  from  which  a  path  was 
traced  to  the  multipin  net  corresponding  to  module  x.  Furthermore,  there  is  atleast 
one  pin  a  of  the  multipin  net  that  is  on  a  path  from  a  pin  that  is  not  in  pins(C).  A 
possible  situation  is  shown  in  Figure  4.19.  The  conditions  of  Lemma  8  are  satisfied 
and  the  RI  is  not  planar  routable. 

If  the  algorithm  terminates  in  step  6(b),  then  r  is  a  pin  of  a  multipin  net  and 
module{s)  is  an  unordered  module.  Let  x  be  at  the  top  of  stack  B  at  the  time  of 
termination.  Let  ;  be  one  of  the  pins  that  have  already  been  routed  to  module{s). 
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Figure  4.19.  An  possible  situation  where  the  RI  is  unroutable 


Let  C  be  the  curve  defined  by  the  stack  A  segment  at  the  time  pin  j  was  output  for 
routing.  Let  S  be  the  curve  or  pin  in  pins{C)  that  was  used  to  reach  pin  x.  Let  y 
be  such  that  net{x)  =  net{y).  Since  y  must  be  below  r  on  stack  A  and  r  is  a  net  of 
a  multipin  net,  the  path  from  t/  to  r  on  stack  A  must  include  a  pin  in  ext.pins{C). 
By  setting  a  and  h  of  Lemma  8  to  x  and  y  respectively,  we  see  that  the  conditions 
of  Lemma  8  are  satisfied  and  the  RI  is  unroutable.  The  proof  for  the  case  x  is  an 
unordered  module  is  similar.    □ 

4.5     Implementation  of  Two-Pin  Algorithm 

While  the  correctness  proof  for  our  algorithm  is  somewhat  involved,  the  algo- 
rithm itself  is  quite  simple  and  easy  to  implement.  To  get  good  performance  we 
implemented  stack  /I  as  a  stack  of  modules  rather  than  one  of  pins  as  described  in 
Section  3.3.    So,  when  step  2  of  Figure  4.8  adds  q  and  the  remaining  pins  of  m  to 
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stack  A,  we  simply  add  a  record  of  the  type  (m,  7,  /)  where  /  is  the  last  pin  of  m  to 
the  stack.  Also,  to  get  the  top  pin  of  stack  A,  we  look  at  the  top  record  (m,g, /). 
The  top  pin  is  /.  To  delete  this  pin,  the  top  record  is  changed  to  (m,9,p(/))  where 
p{l)  is  the  predecessor  of  pin  /  unless  q  =  I.  In  the  latter  case,  the  record  {m,q,  I)  is 
deleted  from  the  stack.  The  role  of  array  status  needs  to  be  changed  to  support  this 
change  in  stack  structure.  We  now  keep  a  status  for  a  module  as  well  as  for  a  pin. 
A  module's  status  reflects  whether  or  not  it  is  in  stack  A  and  a  pin's  status  reflects 
whether  or  not  it  is  in  stack  B. 

The  two-pin  net  algorithm  of  Marek-Sadowska  and  Tarng  [18]  is  a  two  step 
algorithm: 

Step  1:  Merge  modules  together  to  obtain  an  equivalent  routing  problem  in  which 
all  pins  are  on  the  periphery  of  a  routing  region. 

Step  2:  Determine  the  feasibility  of  the  equivalent  problem  using  a  single  stack 
scheme. 

To  implement  step  1,  we  performed  a  traversal  of  the  modules.  Each  module  was 
represented  as  a  singly  linked  circular  list  of  pins.  With  this  representation,  modules 
can  be  merged  efficiently.  By  contrast,  for  the  algorithm  of  Figure  4.8,  modules  were 
represented  using  doubly  linked  circular  lists. 

The  multipin  net  algorithm  of  Marek-Sadowska  and  Tarng  [18]  has  three  steps: 

Step  1:  Merge  modules  together  to  obtain  an  equivalent  routing  problem  in  which 
all  pins  are  on  the  periphery  of  a  routing  region. 
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Figure  4.20.  Tree-like  connected  circuits 


Step  2:  Traverse  the  pins  and  transform  multipin  nets  into  two  pin  nets. 

Step  3:  Determine  the  feasibility  of  the  equivalent  problem  using  a  single  stack 
scheme. 


4.6     Experimental  Results 

We  implemented  our  algorithm  for  two-pin  nets  and  multipin  nets  and  that 
of  Marek-Sadowska  and  Tarng  [18]  in  C  and  obtained  execution  times  using  both 
circuits  that  are  routable  and  those  that  are  not.  For  the  two-pin  net  case,  the 
routable  circuits  used  are  highly  structured  ones  as  shown  in  Figures  4.20  and  4.21 
as  well  as  randomly  generated  ones.  The  nonroutable  circuits  used  were  obtained  by 
modifying  the  tree-like  circuits  of  Figure  4.20. 
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Figure  4.21.  Six- way  connected  circuits 


Figure  4.22.  Tree-like  connected  circuits  with  multipin  nets 
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Figure  4.23.  Eight-way  connected  circuits  with  multipin  nets 


For  the  multipin  net  case,  we  used  highly  structured  circuits  as  shown  in  Fig- 
ures 4.22  and  4.23.  The  nonroutable  circuits  for  the  multipin  case  was  obtained  by 
modifying  the  structured  circuits. 

The  timing  results  for  the  routable  circuits  of  two-pin  nets,  are  shown  in  Tables 
4.1,  4.2  and  4.3  respectively.  The  times  are  in  milliseconds  and  the  programs  were 
run  on  a  SUN  4  workstation.  On  tree-like  circuits,  the  algorithm  of  Marek-Sadowska 
and  Tarng  [18]  took  65%  more  time  than  ours,  on  average;  on  six-way  circuits,  it 
took  approximately  40%  more  time;  and  on  random  circuits,  it  took  approximately 

37%  more  time. 

For  the  multipin  net  case,  the  timing  results  for  the  routable  circuits  are  shown 
in  Tables  4.5  and  4.6.  On  tree-like  circuits  with  multipin  nets,  the  algorithm  of 
Marek-Sadowska  and  Tarng  [18]  took  295%  more  time  than  ours,  on  average;  on 


91 


Table  4.1.  Tree-like  Connected  Circuits 


NP 

864 

704 

1440 

6048 

6944 

24928 

NM 

16 

25 

49 

100 

225 

400 

Our 

3.40 

2.91 

5.53 

22.90 

27.40 

93.30 

[1] 

5.45 

4.36 

9.10 

37.80 

44.90 

162.80 

NP  =  Number  of  pins  in  the  circuit 
NM  =  Number  of  modules  in  the  circuit 


Table  4.2.  Six- Way  Connected  Circuits 


NP 

1792 

1920 

522 

8352 

9856 

17396 

NM 

25 

49 

100 

100 

225 

400 

Our 

8.48 

10.17 

4.20 

44.54 

53.60 

105.40 

[1 

11.08 

11.93 

9.52 

55.18 

66.25 

121.90 
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Table  4.3.  Random  Circuit 


NP 

92 

1472 

2944 

11776 

NM 

28 

28 

28 

28 

Our 

0.91 

6.54 

12.76 

49.95 

[1] 

0.98 

9.18 

18.17 

78.20 

Table  4.4.  Faster  Termination  for  NonRoutable  Circuits 


NP 

868 

6944 

13888 

NM 

25 

225 

225 

Our 

3.70 

16.24 

30.30 

[1] 

15.50 

45.70 

92.50 

eight-way  connected  circuits  with  multipin  nets,  it  took  approximately  86%  more 
time. 

Our  algorithm  has  a  distinct  advantage  over  that  of  Marek-Sadowska  and  Tarng 
[18]  when  working  with  nonroutable  circuits.  For  the  case  of  two-pin  nets,  the  al- 
gorithm of  Marek-Sadowska  and  Tarng  [18]  must  complete  its  step  1  before  it  can 
detect  infeasibility  and  for  the  case  of  multipin  nets  the  algorithm  of  Marek-Sadowska 
and  Tarng  [18]  must  complete  its  step  1  and  step  2  before  it  can  detect  infeasibility, 
whereas  our  algorithm  can  detect  infeasibility  at  any  stage.    Hence,  it  is  possible 
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Table  4.5.  Eight-way  Connected  Circuits  with  Multi-pin  Nets 


NP 

516 

1264 

2328 

4644 

12544 

22724 

NM 

16 

25 

49 

100 

225 

400 

Our 

2.43 

5.73 

11.41 

25.3 

72.4 

133.3 

[1] 

4.46 

10.84 

21.49 

47.70 

134.7 

243.5 

Table  4.6.  Tree-like  Connected  Circuits  With  Multi-pin  Nets 


NP 

738 

1148 

1360 

4068 

10318 

20178 

NM 

16 

25 

36 

100 

225 

400 

Our 

2.47 

3.77 

4.6 

14.62 

39.8 

76.9 

[1] 

9.42 

14.45 

17.75 

58.85 

162.6 

316.2 

Table  4.7.  Faster  Termination  for  NonRoutable  Circuits  With  Multi-pin  Nets 


NP 

3708 

9184 

23218 

NM 

100 

225 

400 

Our 

2.46 

5.43 

10.7 

[1] 

55.1 

101.8 

364.9 
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for  our  algorithm  to  take  much  less  time  than  that  of  Marek-Sadowska  and  Tarng 
[18]  when  working  on  such  circuits.  The  results  from  three  test  circuits  are  given  in 
Tables  4.4  and  4.7.  The  algorithm  of  Marek-Sadowska  and  Tarng  [18]  took  approx- 
imately 3  to  5  times  the  time  taken  by  our  algorithm  for  circuits  with  two  pin  nets 
only  and  for  circuits  with  multipin  nets  it  took  18  to  33  times  as  much  as  time  as  did 
our  algorithm. 

4.7     Conclusion 

We  have  developed  a  relatively  simple  and  fast  linear  time  algorithm  to  test 
the  planar  topological  routability  of  a  collection  of  two-pin  nets.  The  algorithm 
is  faster  than  the  linear  time  algorithm  of  Marek-Sadowska  and  Tarng  [18].  We 
have  also  shown  the  equivalence  of  the  multipin  net  routability  problem  and  the 
graph  planarity  problem.  This  implies  that  we  cannot  solve  the  multipin  problem 
by  a  simpler  algorithm  than  used  for  graph  planarity.  Our  simple  two-pin  algorithm 
can  be  extended  to  work  on  routing  instances  that  remain  connected  following  the 
elimination  of  all  multipin  nets.  These  are  the  same  instances  that  can  be  solved 
by  the  specialized  algorithm  of  Marek-Sadowska  and  Tarng  [18].  We  also  show  that 
determining  the  maximum  number  of  routable  nets  is  NP-hard  when  some  or  all  of 
the  nets  have  more  than  two  pins.  When  all  nets  are  two-pin  nets,  this  can  be  done 
in  O(n^)  time. 


CHAPTER  5 
CONCLUSIONS  AND  FUTURE  WORK 


For  a  stack  of  equal-width  bit-slice  architectural  components,  we  developed  a 
normalization  technique  which  allowed  us  to  obtain  linear  time  algorithms  for  the 
height-constrained  and  width-constrained  layouts.  These  algorithms  are  at  least  an 
order  of  magnitude  faster  than  the  existing  algorithms  for  the  same  problem. 

We  considered  folding  for  a  list  of  standard  cells  and  custom  cells.  We  consid- 
ered various  optimization  constraints  which  resulted  in  nine  different  problem  formu- 
lations. Optimal  algorithms  were  developed  for  all  these  formulations. 

We  have  developed  a  faster  algorithm  for  the  topological  routability  problem 
with  2-pin  nets.  We  have  also  shown  the  equivalence  of  the  multipin  net  topological 
routability  problem  and  the  graph  planarity  problem.  This  means  that  the  multi-pin 
net  topological  routability  problem  cannot  be  solved  by  a  simpler  algorithm  than 
used  for  the  graph  planarity  problem.  We  show  that  the  problem  of  determining  the 
maximum  number  of  topologically  routable  nets  in  the  presence  of  multi-pin  nets  is 
NP-hard. 

Experimental  results  show  that  the  asymptotic  superiority  of  our  algorithms 
translate  into  lesser  execution  time  compared  to  existing  algorithms. 
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The  algorithms  developed  by  Paik  and  Sahni  [21]  for  a  stack  of  bit-sliced  archi- 
tecture components  which  are  of  unequal  width  have  high  time  complexity  (0(n'')  to 
0(n^)).  This  will  translate  into  very  large  execution  time  for  even  small  instances. 
Algorithms  which  have  lesser  asymptotic  time  complexity  could  be  developed. 

Effective  heuristics  could  be  developed  for  the  problem  of  determining  the  max- 
imum number  of  routable  nets  for  topological  routability  of  instances  with  multipin 
nets. 
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